Sorry it’s a bit complicated, but lets say I have a very long table of IDs and Fruits:
ID | Fruit |
---|---|
1 | Apple |
2 | Banana |
4 | Orange |
… | … |
3 | Banana |
1 | Orange |
The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can have 3 instances of “Apple” and 2 instances of “Orange”, etc. ID #2 can have 0 instances of “Apple” and 5 instances of “Orange”
My goal is to filter the table such that as long as the ID has one instance of fruit “Apple”, then ALL rows with that ID should remain in the table. So let’s say that in the table above, ID # 4 and ID #2 do not have a single instance of “Apple” in the entire dataframe. So the ALL rows that contained ID 4 and 2 should be removed from the dataframe
ID | Fruit |
---|---|
1 | Apple |
… | … |
3 | Banana |
1 | Orange |
Any help is appreciated, thank you!
Advertisement
Answer
Use Series.isin
for filter groups with at least one Apple
in column Fruit
filtered in DataFrame.loc
:
df[df['ID'].isin(df.loc[df['Fruit'].eq('Apple'), 'ID'])]