How to calculate cumulative subtraction with a threshold and reset the subtraction after threshold within groups in pandas dataframe in python?

Question

This is a dataframe, with 4 columns. The primary dataframe contains two columns, trip and timestamps, and I calculated 'TimeDistance' which is the difference between rows of timestamps, and 'cum' which is the cumulative sum over TimeDistance column. in order to reach my goal, but I could not. this is the output: This output is not my desired output, I

Accepted Answer

You can use a mask to reset the cumsum:df['TimeDistance'] = df.groupby('trip')['timestamps'].diff(1)# get rows above thresholdm = df['TimeDistance'].gt(10).groupby(df['trip']).shift(fill_value=False)df['cum'] = (df['TimeDistance']             .mask(m, 0)             .groupby([df['trip'], m.cumsum()])             .cumsum()            )output:    trip  timestamps  TimeDistance   cum0      1  1235471761           NaN   NaN1      1  1235471763           2.0   2.02      1  1235471765           2.0   4.03      1  1235471767           2.0   6.04      1  1235471778          11.0  17.05      1  1235471780           2.0   0.06      1  1235471782           2.0   2.07      2  1235471784           NaN   NaN8      2  1235471786           2.0   2.09      2  1235471788           2.0   4.010     2  1235471820          32.0  36.011     2  1235471826           6.0   0.012     2  1235471829           3.0   3.013     3  1235471890           NaN   NaN14     3  1235471893           3.0   3.015     4  1235471894           NaN   NaN16     4  1235471896           2.0   2.017     5  1235471900           NaN   NaN18     5  1235471910          10.0  10.019     5  1235471912           2.0  12.0

Advertisement

Answer