Skip to content
Advertisement

Python pandas group by check if value changed then previous value

I’ve a problem with groupby function of pandas’s library. I’ve the following dataframe.

d = {'id': [400001, 400001, 400001, 400002, 400003, 400003, 400004, 400004], 'result': ['P','N','N','N','N','N','N','P'], 'date':['27/10/2021','09/09/2021','03/07/2020','03/07/2020','30/06/2020','27/04/2022','27/04/2022','30/06/2020']}
df = pd.DataFrame(data=d)
df
id result date
400001 N 2020-07-03
400001 N 2021-09-09
400001 P 2021-10-27
400002 N 2020-07-03
400003 N 2020-06-30
400003 N 2022-04-27
400004 P 2020-06-30
400004 N 2022-04-27

I need to group by column ‘id’ and extract the value of column ‘date’ where the value of column ‘result’ change. If value in column ‘result’ doesn’t change, keep the first value of column ‘date’.

This an example:

id date
400001 2021-10-27
400002 2020-07-03
400003 2020-06-30
400004 2022-04-27

I’ve tried this:

df['change'] = np.where(df.groupby('id').esito.apply(lambda x:x!=x.iloc[0]),'Y','N')

but the function doesn’t works so well. The function verify the difference versus first element of column ‘id’ of groupby selection. I don’t need this.

Can you help me? Thanks

Advertisement

Answer

You can compute a cumsum of the booleans identifying the changes. Then get the max index:

idx = (df.groupby('id')['result']
         .apply(lambda s: s.ne(s.shift())
                .cumsum()
                .idxmax()
               )
       )
df.loc[idx]

Output:

       id result        date
1  400001      N  09/09/2021
3  400002      N  03/07/2020
4  400003      N  30/06/2020
7  400004      P  30/06/2020

NB. The input provided as DataFrame is different from the one as table. The output matching the DataFrame is shown here.

If needed, sort the dates first:

idx = (df.sort_values(by=['id', 'date'])
         .groupby('id')['result']
         .apply(lambda s: s.ne(s.shift())
                .cumsum()
                .idxmax()
               )
       )
df.loc[idx]

Output:

       id result        date
0  400001      P  27/10/2021
3  400002      N  03/07/2020
5  400003      N  27/04/2022
7  400004      P  30/06/2020
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement