How to select the subset of rows where distance is lowest, grouping by date and p columns?
df
    v       p       distance    date
0   14.6    sst     22454.1     2021-12-30
1   14.9    sst     24454.1     2021-12-30
2   14.8    sst     33687.4     2021-12-30
3   1.67    wvht    23141.8     2021-12-30
4   1.9     wvht    24454.1     2021-12-30
5   1.8     wvht    24454.1     2021-12-30
6   1.7     wvht    23141.4     2021-12-31
7   2.1     wvht    24454.1     2021-12-31
Ideally, the returned dataframe should contain:
df
    v       p       distance    date
0   14.6    sst     22454.1     2021-12-30
3   1.67    wvht    23141.8     2021-12-30
6   1.7     wvht    23141.4     2021-12-31
Advertisement
Answer
One way is to use groupby + idxmin to get the index of the smallest distance per group, then use loc to get the desired output:
out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]
Output:
v p distance date 0 14.60 sst 22454.1 2021-12-30 3 1.67 wvht 23141.8 2021-12-30 6 1.70 wvht 23141.4 2021-12-31