Skip to content
Advertisement

How can I figure out the average consecutive duration of “True” values in pandas df, per group?

With the following data, I think I want a column (DESIRED_DURATION_COL) to work out the duration (according to start_datetime) of consecutive Truths:

project_id start_datetime diag_local_code DESIRED_DURATION_COL
1 2017-01-18 False 0
1 2019-04-14 True 0
1 2019-04-17 True 3
1 2019-04-19 False 0
1 2019-04-23 True 0
1 2019-04-25 True 2
1 2019-04-30 True 7
1 2019-05-21 False 0

This is so I can get the average truth duration (mean), per project_id, such that I get a df like:

project_id avg_duration
1 5
2 8
3 2

Can’t work out how to do this, thanks in advance!

Advertisement

Answer

Solution for calculating duration:

JavaScript

How this works?

Use cumsum on inverted diag_local_code to identify groups of consecutive ones per project_id, then filter the rows where diag_local_code is True then group the filtered dataframe and transform start_datetime with first to broadcast first date value across each group, finally subtract the broadcasted date value from start_datetime to calculate the desired duration

Result

JavaScript

Solution for calculating average consecutive duration of True values

JavaScript

Result:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement