I have a big dataset. It’s about news reading. I’m trying to clean it. I created a checklist of cities that I want to keep (the set has all the cities). How can I drop the rows based on that checklist? For example, I have a checklist (as a list) that contains all the french cities. How can I drop other cities?
To picture the data frame (I have 1.5m rows btw):
City Age 0 Paris 25-34 1 Lyon 45-54 2 Kiev 35-44 3 Berlin 25-34 4 New York 25-34 5 Paris 65+ 6 Toulouse 35-44 7 Nice 55-64 8 Hannover 45-54 9 Lille 35-44 10 Edinburgh 65+ 11 Moscow 25-34
Advertisement
Answer
You can do this using pandas.Dataframe.isin
. This will return boolean values checking whether each element is inside the list x
. You can then use the boolean values and take out the subset of the df
with rows that return True
by doing df[df['City'].isin(x)]
. Following is my solution:
import pandas as pd x = ['Paris' , 'Marseille'] df = pd.DataFrame(data={'City':['Paris', 'London', 'New York', 'Marseille'], 'Age':[1, 2, 3, 4]}) print(df) df = df[df['City'].isin(x)] print(df)
Output:
>>> City Age 0 Paris 1 1 London 2 2 New York 3 3 Marseille 4 City Age 0 Paris 1 3 Marseille 4