How can I figure out the average consecutive duration of “True” values in pandas df, per group?

Question

With the following data, I think I want a column (DESIRED_DURATION_COL) to work out the duration (according to start_datetime) of consecutive Truths: project_id start_datetime diag_local_code DESIRED_DURATION_COL 1 2017-01-18 False 0 1 2019-04-14 True 0 1 2019-04-17 True 3 1 2019-04-19 False 0 1 2019-04-23 True 0 1 2019-04-25 True 2 1 2019-04-30 True 7 1 2019-05-21 False 0 This is

Accepted Answer

Solution for calculating duration:m = df['diag_local_code']dt = df[m].groupby(['project_id', (~m).cumsum()])['start_datetime'].transform('first')df['duration'] = df['start_datetime'].sub(dt).dt.days.fillna(0)How this works?Use cumsum on inverted diag_local_code to identify groups of consecutive ones per project_id, then filter the rows where diag_local_code is True then group the filtered dataframe and transform start_datetime with first to broadcast first date value across each group, finally subtract the broadcasted date value from start_datetime to calculate the desired durationResult   project_id start_datetime  diag_local_code  duration0           1     2017-01-18            False       0.01           1     2019-04-14             True       0.02           1     2019-04-17             True       3.03           1     2019-04-19            False       0.04           1     2019-04-23             True       0.05           1     2019-04-25             True       2.06           1     2019-04-30             True       7.07           1     2019-05-21            False       0.0Solution for calculating average consecutive duration of True valuesm = df['diag_local_code'](    df[m].groupby(['project_id', (~m).cumsum()])['start_datetime']         .agg(np.ptp).dt.days.groupby(level=0).mean().reset_index(name='avg_duration'))Result:   project_id  avg_duration0           1           5.0

project_id	start_datetime	diag_local_code	DESIRED_DURATION_COL
1	2017-01-18	False	0
1	2019-04-14	True	0
1	2019-04-17	True	3
1	2019-04-19	False	0
1	2019-04-23	True	0
1	2019-04-25	True	2
1	2019-04-30	True	7
1	2019-05-21	False	0

project_id	avg_duration
1	5
2	8
3	2

Advertisement

Answer