Skip to content
Advertisement

How do I reverse a cumulative count from a specific point based on a condition and then resume the count in a pandas data frame?

I am trying to count the number of days between dates (cumulatively), (grouped by a column denoted as id), however, I want to reset the counter whenever a condition is satisfied.

I want to at the same time create a new column and add the values to that column for those particular rows. Additionally, I want to also count back the reset point, denoting negative days.

Currently, I have tried this:

JavaScript

which produces this:

JavaScript

Now, that I have added “tdelta reverse”, here is a clearer example (with different data) of what I want the data frame to look like in the end result:

JavaScript

Essentially, a new ‘tdelta#’ column should be created for each group, where we get the ‘tdelta reverse’ values until a reset point and the ‘tdelta’ values afterwards (for each group).

As a side note, if an id does not have several groups (reset points), it is ok to not fill in these additional ‘tdelta#’ columns.

At the moment, I am creating new columns and filling them with the ‘tdelta’ values:

JavaScript

However, I also need to add the ‘tdelta reverse’ values so it looks like my end example.

I’m thinking that I should perhaps use iloc with groupby and/or do some splicing?

Any suggestions on how I can tackle this?

Advertisement

Answer

So I have solved it (albeit with an ad hoc method in my opinion) by adding a pandas combine_first function that combines non nan values from both columns as seen in the try and except statement lower down in the code below:

JavaScript

This is the output:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement