I have the following data:
| id | encounter_key | datetime | 
|---|---|---|
| 1 | 111 | 2019-04-14 | 
| 1 | 111 | 2019-04-14 | 
| 1 | 111 | 2019-07-18 | 
| 1 | 122 | 2019-09-02 | 
| 2 | 211 | 2019-10-03 | 
| 2 | 211 | 2020-10-03 | 
I want to find the cumulative duration, grouped by id and encounter_key to achieve the following:
| id | encounter_key | datetime | cum_duration_days | 
|---|---|---|---|
| 1 | 111 | 2019-04-14 | 0 | 
| 1 | 111 | 2019-04-14 | 0 | 
| 1 | 111 | 2019-07-18 | 95 | 
| 1 | 122 | 2019-09-02 | 0 | 
| 2 | 211 | 2019-10-03 | 0 | 
| 2 | 211 | 2020-10-03 | 366 | 
I’ve tried df.groupby(['datetime']).apply( ... ) etc. but nothing seems to work. Thanks in advance.
Advertisement
Answer
I think this should work
df['cum_duration_days']=df.groupby(['id','encounter_key'])['datetime'].diff()/ np.timedelta64(1, 'D') df['cum_duration_days'].fillna(0)
But @enke is right, the output desired seems to have an error on the ’31’ unless you are not showing all the rows for encounter_key=211 /id=2 …