I have a big dataset. It’s about news reading. I’m trying to clean it. I created a checklist of cities that I want to keep (the set has all the cities). How can I drop the rows based on that checklist? For example, I have a checklist (as a list) that contains all the french cities. How can I drop other cities?
To picture the data frame (I have 1.5m rows btw):
JavaScript
x
14
14
1
City Age
2
0 Paris 25-34
3
1 Lyon 45-54
4
2 Kiev 35-44
5
3 Berlin 25-34
6
4 New York 25-34
7
5 Paris 65+
8
6 Toulouse 35-44
9
7 Nice 55-64
10
8 Hannover 45-54
11
9 Lille 35-44
12
10 Edinburgh 65+
13
11 Moscow 25-34
14
Advertisement
Answer
You can do this using pandas.Dataframe.isin
. This will return boolean values checking whether each element is inside the list x
. You can then use the boolean values and take out the subset of the df
with rows that return True
by doing df[df['City'].isin(x)]
. Following is my solution:
JavaScript
1
11
11
1
import pandas as pd
2
3
x = ['Paris' , 'Marseille']
4
df = pd.DataFrame(data={'City':['Paris', 'London', 'New York', 'Marseille'],
5
'Age':[1, 2, 3, 4]})
6
7
print(df)
8
9
df = df[df['City'].isin(x)]
10
print(df)
11
Output:
JavaScript
1
9
1
>>> City Age
2
0 Paris 1
3
1 London 2
4
2 New York 3
5
3 Marseille 4
6
City Age
7
0 Paris 1
8
3 Marseille 4
9