Skip to content
Advertisement

How to drop rows from a pandas dataframe based on a pre-made list

I have a big dataset. It’s about news reading. I’m trying to clean it. I created a checklist of cities that I want to keep (the set has all the cities). How can I drop the rows based on that checklist? For example, I have a checklist (as a list) that contains all the french cities. How can I drop other cities?

To picture the data frame (I have 1.5m rows btw):

   City         Age
0  Paris      25-34
1  Lyon       45-54
2  Kiev       35-44
3  Berlin     25-34
4  New York   25-34
5  Paris      65+
6  Toulouse   35-44
7  Nice       55-64
8  Hannover   45-54
9  Lille      35-44
10 Edinburgh  65+
11 Moscow     25-34

Advertisement

Answer

You can do this using pandas.Dataframe.isin. This will return boolean values checking whether each element is inside the list x. You can then use the boolean values and take out the subset of the df with rows that return True by doing df[df['City'].isin(x)]. Following is my solution:

import pandas as pd

x = ['Paris' , 'Marseille']
df = pd.DataFrame(data={'City':['Paris', 'London', 'New York', 'Marseille'],
                        'Age':[1, 2, 3, 4]})

print(df)

df = df[df['City'].isin(x)]
print(df)

Output:

>>>         City  Age
0      Paris    1
1     London    2
2   New York    3
3  Marseille    4
        City  Age
0      Paris    1
3  Marseille    4
Advertisement