Skip to content
Advertisement

Group by in pandas for criteria on one column and getting records for other columns as-is

So my dataframe looks something like this –

ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP

ABC123|2020-05-18 09:01:35|30|10|CHH|2020-05-20

DEF123|2020-05-18 09:04:35|50|20|CHH|2020-06-19

ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20

PQR333|2020-05-18 09:13:12|50|10|SSS|2020-06-19

DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19

PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19

I want to group by ORD_ID. And grab the record which is last in TIME for that ORD_ID (without performing any aggregate function on other columns). i.e. the desired output is –

ORD_ID|TIME|VOL|VOL_DSCL|SMBL|EXP

ABC123|2020-05-18 09:06:45|20|10|CHH|2020-05-20

DEF123|2020-05-18 09:24:35|20|20|CHH|2020-06-19

PQR333|2020-05-18 09:26:23|0|0|SSS|2020-06-19

How can this be achieved? (so only the last record in TIME as per each unique ORD_ID )

Advertisement

Answer

You don’t need groupby, drop_duplicates would do:

df.sort_values('TIME').drop_duplicates('ORD_ID',keep='last')

Or if you really want groupby:

df.groupby('ORD_ID').tail(1)

Output:

   ORD_ID                 TIME  VOL  VOL_DSCL SMBL         EXP
2  ABC123  2020-05-18 09:06:45   20        10  CHH  2020-05-20
4  DEF123  2020-05-18 09:24:35   20        20  CHH  2020-06-19
5  PQR333  2020-05-18 09:26:23    0         0  SSS  2020-06-19
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement