Remove duplicates and keep row that certain column is Yes in a pandas dataframe

Question

I have a dataframe with duplicated values on column "ID", like this one: I need a way to remove duplicates (by "ID") but keep the ones that the column Primary is "Yes" (all unique values have "Yes" in that column and duplicated values have one record as "Yes" and all others as "No") resulting in this dataframe: What is the

Accepted Answer

Use DataFrame.sort_values &#8211; Yes rows are in end of DataFrame, so possible use DataFrame.drop_duplicates with keep='last' &#8211; this solution should return Primary?=No if exist some ID without Primary?=Yes values:df = df.sort_values('Primary?').drop_duplicates('ID', keep='last')print (df)   ID   Name   Street       Birth   Job Primary?0   1  Fake1  Street1  2000-01-01  Job1      Yes2   3  Fake3  Street3  2000-01-03  Job3      Yes4   2  Fake2  Street2  2000-01-02  Job5      Yes5   4  Fake4  Street4  2000-01-03  Job6      Yes

Advertisement

Answer