So my dataframe looks something like this –
JavaScript
x
14
14
1
ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP
2
3
ABC123|2020-05-18 09:01:35|30|10|CHH|2020-05-20
4
5
DEF123|2020-05-18 09:04:35|50|20|CHH|2020-06-19
6
7
ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20
8
9
PQR333|2020-05-18 09:13:12|50|10|SSS|2020-06-19
10
11
DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19
12
13
PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19
14
I want to group by ORD_ID. And grab the record which is last in TIME for that ORD_ID (without performing any aggregate function on other columns). i.e. the desired output is –
JavaScript
1
8
1
ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP
2
3
ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20
4
5
DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19
6
7
PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19
8
How can this be achieved? (so only the last record in TIME as per each unique ORD_ID )
Advertisement
Answer
You don’t need groupby
, drop_duplicates
would do:
JavaScript
1
2
1
df.sort_values('TIME').drop_duplicates('ORD_ID',keep='last')
2
Or if you really want groupby:
JavaScript
1
2
1
df.groupby('ORD_ID').tail(1)
2
Output:
JavaScript
1
5
1
ORD_ID TIME VOL VOL_DSCL SMBL EXP
2
2 ABC123 2020-05-18 09:06:45 20 10 CHH 2020-05-20
3
4 DEF123 2020-05-18 09:24:35 20 20 CHH 2020-06-19
4
5 PQR333 2020-05-18 09:26:23 0 0 SSS 2020-06-19
5