Skip to content
Advertisement

How to calculate cumulative subtraction with a threshold and reset the subtraction after threshold within groups in pandas dataframe in python?

This is a dataframe, with 4 columns. The primary dataframe contains two columns, trip and timestamps, and I calculated ‘TimeDistance’ which is the difference between rows of timestamps, and ‘cum’ which is the cumulative sum over TimeDistance column. in order to reach my goal, but I could not.

JavaScript

this is the output:

JavaScript

This output is not my desired output, I want to subtract each row of the timestamp column from the first row for each trip, store it in a new column (cum), and whenever it reaches 10, do these for the next rows:

  • reset the subtraction,
  • the next row after the row in which the threshold is reached will be considered as the origin and it must be equal to zero,
  • continue subtraction from this row (which is equal to zero) and subsequent rows again until we reach 10.
  • Whenever we reach the end of a trip, the subtraction will also reset for a new trip.
  • Repeat this procedure for all trips.

for example, in row 4, we have reached to threshold because the value in ‘cum’ column is 17, so, the next row in the ‘cum’ column must be 0 (but it is 19) and for row 6, we have to calculate the difference between timestamps in row 5, 6 that should be 2, not 19!

for more clarity, I have attached a screenshot from my desired output

Advertisement

Answer

You can use a mask to reset the cumsum:

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement