I am trying to calculate a running total per customer for the previous 365 days using pandas but my code isn’t working. My intended output would be something like this:
date | customer | daily_total_per_customer | rolling_total |
---|---|---|---|
2016-07-29 | 1 | 100 | 100 |
2016-08-01 | 1 | 50 | 150 |
2017-01-12 | 1 | 80 | 230 |
2017-10-23 | 1 | 180 | 260 |
2018-03-03 | 1 | 0 | 180 |
2018-03-06 | 1 | 40 | 220 |
2019-03-16 | 1 | 500 | 500 |
2017-04-07 | 2 | 50 | 50 |
2017-04-09 | 2 | 230 | 280 |
2018-02-11 | 2 | 80 | 360 |
2018-05-12 | 2 | 0 | 80 |
2019-05-10 | 2 | 0 | 0 |
I tried the following:
df_3 = df_3.set_index(['customer', 'date']).sort_values(by='date') rolling_sum = df_3.rolling('365d', on='date')["daily_total_per_customer"].sum() df_3["rolling_total"] = rolling_sum
And I get the following error
ValueError: invalid on specified as date, must be a column (of DataFrame), an Index or None To recreate the code:
dates = ['2016-07-29', '2016-08-01', '2017-01-12', '2017-10-23', '2018-03-03', '2018-03-06', '2019-03-16', '2017-04-07', '2017-04-09', '2018-02-11', '2018-05-12', '2019-05-10', ] customer = [1,1,1,1,1,1,1,2,2,2,2,2] daily_total = [100,50,80,180,0,40,500,50,230,80,0,0] df = pd.DataFrame({'date': dates, 'customer': customer, 'daily_total_per_customer':daily_total,})
Perhaps someone can point me in the right direction. Thanks!
Advertisement
Answer
Annotated code
# Parse the strings to datetime df['date'] = pd.to_datetime(df['date']) # Sort the dates in ASC order if not already sorted df = df.sort_values(['customer', 'date']) # Group the dataframe by customer then for each group # calculate rolling sum on 'daily_total_per_customer' s = df.groupby('customer').rolling('365d', on='date')['daily_total_per_customer'].sum() # Merge the result with original df df.merge(s.reset_index(name='rolling_total'))
date customer daily_total_per_customer rolling_total 0 2016-07-29 1 100 100 1 2016-08-01 1 50 150 2 2017-01-12 1 80 230 3 2017-10-23 1 180 260 4 2018-03-03 1 0 180 5 2018-03-06 1 40 220 6 2019-03-16 1 500 500 7 2017-04-07 2 50 50 8 2017-04-09 2 230 280 9 2018-02-11 2 80 360 10 2018-05-12 2 0 80 11 2019-05-10 2 0 0