So my dataframe looks something like this –
ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP ABC123|2020-05-18 09:01:35|30|10|CHH|2020-05-20 DEF123|2020-05-18 09:04:35|50|20|CHH|2020-06-19 ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20 PQR333|2020-05-18 09:13:12|50|10|SSS|2020-06-19 DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19 PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19
I want to group by ORD_ID. And grab the record which is last in TIME for that ORD_ID (without performing any aggregate function on other columns). i.e. the desired output is –
ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20 DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19 PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19
How can this be achieved? (so only the last record in TIME as per each unique ORD_ID )
Advertisement
Answer
You don’t need groupby
, drop_duplicates
would do:
df.sort_values('TIME').drop_duplicates('ORD_ID',keep='last')
Or if you really want groupby:
df.groupby('ORD_ID').tail(1)
Output:
ORD_ID TIME VOL VOL_DSCL SMBL EXP 2 ABC123 2020-05-18 09:06:45 20 10 CHH 2020-05-20 4 DEF123 2020-05-18 09:24:35 20 20 CHH 2020-06-19 5 PQR333 2020-05-18 09:26:23 0 0 SSS 2020-06-19