Sorry it’s a bit complicated, but lets say I have a very long table of IDs and Fruits: ID Fruit 1 Apple 2 Banana 4 Orange … … 3 Banana 1 Orange The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can

How to filter subcategories of rows from one colum…

Sorry it’s a bit complicated, but lets say I have a very long table of IDs and Fruits:

ID	Fruit
1	Apple
2	Banana
4	Orange
…	…
3	Banana
1	Orange

The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can have 3 instances of “Apple” and 2 instances of “Orange”, etc. ID #2 can have 0 instances of “Apple” and 5 instances of “Orange”

My goal is to filter the table such that as long as the ID has one instance of fruit “Apple”, then ALL rows with that ID should remain in the table. So let’s say that in the table above, ID # 4 and ID #2 do not have a single instance of “Apple” in the entire dataframe. So the ALL rows that contained ID 4 and 2 should be removed from the dataframe

ID	Fruit
1	Apple
…	…
3	Banana
1	Orange

Any help is appreciated, thank you!

Answer

Use Series.isin for filter groups with at least one Apple in column Fruit filtered in DataFrame.loc:

df[df['ID'].isin(df.loc[df['Fruit'].eq('Apple'), 'ID'])]

How to filter subcategories of rows from one column, based on counts in second column

Advertisement

Answer