Python pandas group by check if value changed then previous value

Question

I've a problem with groupby function of pandas's library. I've the following dataframe. id result date 400001 N 2020-07-03 400001 N 2021-09-09 400001 P 2021-10-27 400002 N 2020-07-03 400003 N 2020-06-30 400003 N 2022-04-27 400004 P 2020-06-30 400004 N 2022-04-27 I need to group by column 'id' and extract the value of column 'date' where the value of column 'result'

Accepted Answer

You can compute a cumsum of the booleans identifying the changes. Then get the max index:idx = (df.groupby('id')['result']         .apply(lambda s: s.ne(s.shift())                .cumsum()                .idxmax()               )       )df.loc[idx]Output:       id result        date1  400001      N  09/09/20213  400002      N  03/07/20204  400003      N  30/06/20207  400004      P  30/06/2020NB. The input provided as DataFrame is different from the one as table. The output matching the DataFrame is shown here.If needed, sort the dates first:idx = (df.sort_values(by=['id', 'date'])         .groupby('id')['result']         .apply(lambda s: s.ne(s.shift())                .cumsum()                .idxmax()               )       )df.loc[idx]Output:       id result        date0  400001      P  27/10/20213  400002      N  03/07/20205  400003      N  27/04/20227  400004      P  30/06/2020

id	result	date
400001	N	2020-07-03
400001	N	2021-09-09
400001	P	2021-10-27
400002	N	2020-07-03
400003	N	2020-06-30
400003	N	2022-04-27
400004	P	2020-06-30
400004	N	2022-04-27

Advertisement

Answer