Skip to content
Advertisement

How to get grouped cumulative duration in pandas?

I have the following data:

id encounter_key datetime
1 111 2019-04-14
1 111 2019-04-14
1 111 2019-07-18
1 122 2019-09-02
2 211 2019-10-03
2 211 2020-10-03

I want to find the cumulative duration, grouped by id and encounter_key to achieve the following:

id encounter_key datetime cum_duration_days
1 111 2019-04-14 0
1 111 2019-04-14 0
1 111 2019-07-18 95
1 122 2019-09-02 0
2 211 2019-10-03 0
2 211 2020-10-03 366

I’ve tried df.groupby(['datetime']).apply( ... ) etc. but nothing seems to work. Thanks in advance.

Advertisement

Answer

I think this should work

df['cum_duration_days']=df.groupby(['id','encounter_key'])['datetime'].diff()/ np.timedelta64(1, 'D')
df['cum_duration_days'].fillna(0)

But @enke is right, the output desired seems to have an error on the ’31’ unless you are not showing all the rows for encounter_key=211 /id=2 …

Advertisement