Skip to content
Advertisement

Python : Dropping specific rows in a dataframe and keep a specif one

Let’s say that I have this dataframe

Name = ['ID', 'Country', 'IBAN','ID_info_1', 'Dan_Age', 'ID_info_1','Dan_city','ID_info_1','Dan_country','ID_info_1', 'ID_info_2', 'ID_info_2','ID_info_2', 'Dan_sex', 'Dan_Age', 'Dan_country','Dan_sex' , 'Dan_city','Dan_country' ]
Value = ['TAMARA_CO', 'GERMANY','FR56', '12', '18','25','Berlin','34', '55','345','432', '43', 'GER', 'M', '22', 'FRA', 'M', 'Madrid', 'ESP']
Ccy = ['','','','EUR','EUR','EUR','','EUR','','','','EUR','EUR','USD','USD','','CHF', '','DKN']
Group = ['0','0','0','1','1','2','2','3','3','4','1','2','3','4','2','2','2','3','3']
df = pd.DataFrame({'Name':Name, 'Value' : Value, 'Ccy' : Ccy,'Group':Group})

print(df)

           Name      Value  Ccy Group
0            ID  TAMARA_CO          0
1       Country    GERMANY          0
2          IBAN       FR56          0
3     ID_info_1         12  EUR     1
4       Dan_Age         18  EUR     1
5     ID_info_1         25  EUR     2
6      Dan_city     Berlin          2
7     ID_info_1         34  EUR     3
8   Dan_country         55          3
9     ID_info_1        345          4
10    ID_info_2        432          1
11    ID_info_2         43  EUR     2
12    ID_info_2        GER  EUR     3
13      Dan_sex          M  USD     4
14      Dan_Age         22  USD     2
15  Dan_country        FRA          2
16      Dan_sex          M  CHF     2
17     Dan_city     Madrid          3
18  Dan_country        ESP  DKN     3

I want to reduce this dataframe ! I want to reduce only the rows that contains the string “info” by keeping the ones that have the highest level in the column “Group”. So in this dataframe, it would mean that I keep the row “ID_info_1” in the group 4, and “ID_info_1” in the group 3. In addition, I want to change the their value in the column “Group” to 1.

So at the end I would like to get this new dataframe where the indexing is reset too

           Name      Value  Ccy Group
0            ID  TAMARA_CO          0
1       Country    GERMANY          0
2          IBAN       FR56          0
3     ID_info_1         12  EUR     1
4       Dan_Age         18  EUR     1
5      Dan_city     Berlin          2
6   Dan_country         55          3
7     ID_info_1        345          1
8     ID_info_2        GER  EUR     1
9       Dan_sex          M  USD     4
10      Dan_Age         22  USD     2
11  Dan_country        FRA          2
12      Dan_sex          M  CHF     2
13     Dan_city     Madrid          3
14  Dan_country        ESP  DKN     3

Anyone has an efficient idea ?

Thank you

Advertisement

Answer

How about this:

# select rows with "info"
di = df[df.Name.str.contains('info')]

# Find the rows below max for removal
di = di[di.groupby('Name')['Group'].transform('max') != di['Group']]

# Remove those rows and set a new index as requested
df = df.drop(di.index).reset_index(drop=True)

# Change group to one on remaining "info" rows
df.loc[df.Name.str.contains('info'), 'Group'] = 1
Advertisement