Remove duplicates by columns A, keeping the row with the highest value in column B

Question

I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B. So this: Should turn into this: I&#8217;m guessing there&#8217;s probably an easy way to do this—maybe as easy as sorting the DataFrame before dropping duplicates—but I don&#8217;t…

Accepted Answer

This takes the last. Not the maximum though:In [10]: df.drop_duplicates(subset='A', keep="last")Out[10]:    A   B1  1  203  2  404  3  10You can do also something like:In [12]: df.groupby('A', group_keys=False).apply(lambda x: x.loc[x.B.idxmax()])Out[12]:    A   BA       1  1  202  2  403  3  10

Advertisement

Answer