I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B.
So this:
JavaScript
x
7
1
A B
2
1 10
3
1 20
4
2 30
5
2 40
6
3 10
7
Should turn into this:
JavaScript
1
5
1
A B
2
1 20
3
2 40
4
3 10
5
I’m guessing there’s probably an easy way to do this—maybe as easy as sorting the DataFrame before dropping duplicates—but I don’t know groupby’s internal logic well enough to figure it out. Any suggestions?
Advertisement
Answer
This takes the last. Not the maximum though:
JavaScript
1
7
1
In [10]: df.drop_duplicates(subset='A', keep="last")
2
Out[10]:
3
A B
4
1 1 20
5
3 2 40
6
4 3 10
7
You can do also something like:
JavaScript
1
8
1
In [12]: df.groupby('A', group_keys=False).apply(lambda x: x.loc[x.B.idxmax()])
2
Out[12]:
3
A B
4
A
5
1 1 20
6
2 2 40
7
3 3 10
8