How do I reverse a cumulative count from a specific point based on a condition and then resume the count in a pandas data frame?

Question

I am trying to count the number of days between dates (cumulatively), (grouped by a column denoted as id), however, I want to reset the counter whenever a condition is satisfied. I want to at the same time create a new column and add the values to that column for those particular rows. Additionally, I want to also count back

Accepted Answer

So I have solved it (albeit with an ad hoc method in my opinion) by adding a pandas combine_first function that combines non nan values from both columns as seen in the try and except statement lower down in the code below:# defined a new df for clearer outputdf = pd.DataFrame({'reset':['N','Y','N','N','N','Y','N','N','Y','N','N'],                   'category':['low','low','low','low','low','low','low','low','low','low', 'low'],                   'date':['2019-09-04','2020-11-06','2020-11-06','2019-09-07','2019-11-08','2021-05-21','2021-06-23','2021-07-24','2021-08-25','2021-09-23', '2021-10-21'],                   'id':[16860,16860,16860,16860,16860,16860,16860,16860,16860,16860, 16860]                   })df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')df = df.sort_values(['id','date'])#create extra grouping column based on reset daydf['group'] = df['reset'].replace({'N':False,'Y':True})df['group'] = df.groupby('id')['group'].cumsum()df['tdelta'] = df.groupby(['id','group'])['date'].diff() / np.timedelta64(1, 'D')df['tdelta'] = df.groupby(['id','group'])['tdelta'].cumsum().fillna(0)df = df.sort_values(by='date', ascending=False)df['tdelta reverse'] =  df.groupby(['id','group'])['date'].diff() / np.timedelta64(1, 'D') df['tdelta reverse'] = df.groupby(['id','group'])['tdelta reverse'].cumsum().fillna(0) # the problem solved via combine_first which combines the non nan values from both columnsdf = df.sort_values(['id','date'])    for group in df['group'].unique():    group_minus_1 = group - 1.0    try:        df[f'tdelta{int(group)}'] = df[(df['group'] == group)]['tdelta']        df[f'tdelta{int(group)}'] = df[f'tdelta{int(group)}'].combine_first(df[(df['group'] == group_minus_1)]['tdelta reverse'])     except:        continue#print(df)This is the output:  reset category       date     id  group  tdelta  tdelta reverse  tdelta0  tdelta1  tdelta2  tdelta30      N      low 2019-09-04  16860    0.0     NaN           -65.0      0.0    -65.0      NaN      NaN3      N      low 2019-09-07  16860    0.0     NaN           -62.0      3.0    -62.0      NaN      NaN4      N      low 2019-11-08  16860    0.0     NaN             0.0     65.0      0.0      NaN      NaN1      Y      low 2020-11-06  16860    1.0   250.0             0.0      NaN      0.0      0.0      NaN2      N      low 2020-11-06  16860    1.0   250.0             0.0      NaN      0.0      0.0      NaN5      Y      low 2021-05-21  16860    2.0   250.0           -64.0      NaN      NaN      0.0    -64.06      N      low 2021-06-23  16860    2.0     NaN           -31.0      NaN      NaN     33.0    -31.07      N      low 2021-07-24  16860    2.0     NaN             0.0      NaN      NaN     64.0      0.08      Y      low 2021-08-25  16860    3.0   250.0           -57.0      NaN      NaN      NaN      0.09      N      low 2021-09-23  16860    3.0     NaN           -28.0      NaN      NaN      NaN     29.010     N      low 2021-10-21  16860    3.0     NaN             0.0      NaN      NaN      NaN     57.0

Advertisement

Answer