How to select the subset of rows where distance is lowest, grouping by date
and p
columns?
JavaScript
x
11
11
1
df
2
v p distance date
3
0 14.6 sst 22454.1 2021-12-30
4
1 14.9 sst 24454.1 2021-12-30
5
2 14.8 sst 33687.4 2021-12-30
6
3 1.67 wvht 23141.8 2021-12-30
7
4 1.9 wvht 24454.1 2021-12-30
8
5 1.8 wvht 24454.1 2021-12-30
9
6 1.7 wvht 23141.4 2021-12-31
10
7 2.1 wvht 24454.1 2021-12-31
11
Ideally, the returned dataframe should contain:
JavaScript
1
6
1
df
2
v p distance date
3
0 14.6 sst 22454.1 2021-12-30
4
3 1.67 wvht 23141.8 2021-12-30
5
6 1.7 wvht 23141.4 2021-12-31
6
Advertisement
Answer
One way is to use groupby
+ idxmin
to get the index of the smallest distance per group, then use loc
to get the desired output:
JavaScript
1
2
1
out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]
2
Output:
JavaScript
1
5
1
v p distance date
2
0 14.60 sst 22454.1 2021-12-30
3
3 1.67 wvht 23141.8 2021-12-30
4
6 1.70 wvht 23141.4 2021-12-31
5