How to select the subset of rows where distance is lowest, grouping by date
and p
columns?
df v p distance date 0 14.6 sst 22454.1 2021-12-30 1 14.9 sst 24454.1 2021-12-30 2 14.8 sst 33687.4 2021-12-30 3 1.67 wvht 23141.8 2021-12-30 4 1.9 wvht 24454.1 2021-12-30 5 1.8 wvht 24454.1 2021-12-30 6 1.7 wvht 23141.4 2021-12-31 7 2.1 wvht 24454.1 2021-12-31
Ideally, the returned dataframe should contain:
df v p distance date 0 14.6 sst 22454.1 2021-12-30 3 1.67 wvht 23141.8 2021-12-30 6 1.7 wvht 23141.4 2021-12-31
Advertisement
Answer
One way is to use groupby
+ idxmin
to get the index of the smallest distance per group, then use loc
to get the desired output:
out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]
Output:
v p distance date 0 14.60 sst 22454.1 2021-12-30 3 1.67 wvht 23141.8 2021-12-30 6 1.70 wvht 23141.4 2021-12-31