# How can I figure out the average consecutive duration of “True” values in pandas df, per group?

With the following data, I think I want a column (`DESIRED_DURATION_COL`) to work out the duration (according to `start_datetime`) of consecutive Truths:

project_id start_datetime diag_local_code DESIRED_DURATION_COL
1 2017-01-18 False 0
1 2019-04-14 True 0
1 2019-04-17 True 3
1 2019-04-19 False 0
1 2019-04-23 True 0
1 2019-04-25 True 2
1 2019-04-30 True 7
1 2019-05-21 False 0

This is so I can get the average truth duration (mean), per `project_id`, such that I get a df like:

project_id avg_duration
1 5
2 8
3 2

Can’t work out how to do this, thanks in advance!

Solution for calculating `duration`:

```m = df['diag_local_code']
dt = df[m].groupby(['project_id', (~m).cumsum()])['start_datetime'].transform('first')
df['duration'] = df['start_datetime'].sub(dt).dt.days.fillna(0)
```

How this works?

Use `cumsum` on inverted `diag_local_code` to identify groups of consecutive ones per `project_id`, then filter the rows where `diag_local_code` is `True` then group the filtered dataframe and transform `start_datetime` with `first` to broadcast first date value across each group, finally subtract the broadcasted date value from `start_datetime` to calculate the desired duration

Result

```   project_id start_datetime  diag_local_code  duration
0           1     2017-01-18            False       0.0
1           1     2019-04-14             True       0.0
2           1     2019-04-17             True       3.0
3           1     2019-04-19            False       0.0
4           1     2019-04-23             True       0.0
5           1     2019-04-25             True       2.0
6           1     2019-04-30             True       7.0
7           1     2019-05-21            False       0.0
```

Solution for calculating average consecutive duration of `True` values

```m = df['diag_local_code']
(
df[m].groupby(['project_id', (~m).cumsum()])['start_datetime']
.agg(np.ptp).dt.days.groupby(level=0).mean().reset_index(name='avg_duration')
)
```

Result:

```   project_id  avg_duration
0           1           5.0
```