Skip to content
Advertisement

How to filter subcategories of rows from one column, based on counts in second column

Sorry it’s a bit complicated, but lets say I have a very long table of IDs and Fruits:

ID Fruit
1 Apple
2 Banana
4 Orange
3 Banana
1 Orange

The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can have 3 instances of “Apple” and 2 instances of “Orange”, etc. ID #2 can have 0 instances of “Apple” and 5 instances of “Orange”

My goal is to filter the table such that as long as the ID has one instance of fruit “Apple”, then ALL rows with that ID should remain in the table. So let’s say that in the table above, ID # 4 and ID #2 do not have a single instance of “Apple” in the entire dataframe. So the ALL rows that contained ID 4 and 2 should be removed from the dataframe

ID Fruit
1 Apple
3 Banana
1 Orange

Any help is appreciated, thank you!

Advertisement

Answer

Use Series.isin for filter groups with at least one Apple in column Fruit filtered in DataFrame.loc:

df[df['ID'].isin(df.loc[df['Fruit'].eq('Apple'), 'ID'])]
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement