I am trying to calculate a running total per customer for the previous 365 days using pandas but my code isn’t working. My intended output would be something like this:
date | customer | daily_total_per_customer | rolling_total |
---|---|---|---|
2016-07-29 | 1 | 100 | 100 |
2016-08-01 | 1 | 50 | 150 |
2017-01-12 | 1 | 80 | 230 |
2017-10-23 | 1 | 180 | 260 |
2018-03-03 | 1 | 0 | 180 |
2018-03-06 | 1 | 40 | 220 |
2019-03-16 | 1 | 500 | 500 |
2017-04-07 | 2 | 50 | 50 |
2017-04-09 | 2 | 230 | 280 |
2018-02-11 | 2 | 80 | 360 |
2018-05-12 | 2 | 0 | 80 |
2019-05-10 | 2 | 0 | 0 |
I tried the following:
JavaScript
x
5
1
df_3 = df_3.set_index(['customer', 'date']).sort_values(by='date')
2
3
rolling_sum = df_3.rolling('365d', on='date')["daily_total_per_customer"].sum()
4
df_3["rolling_total"] = rolling_sum
5
And I get the following error
ValueError: invalid on specified as date, must be a column (of DataFrame), an Index or None To recreate the code:
JavaScript
1
24
24
1
dates = ['2016-07-29',
2
'2016-08-01',
3
'2017-01-12',
4
'2017-10-23',
5
'2018-03-03',
6
'2018-03-06',
7
'2019-03-16',
8
'2017-04-07',
9
'2017-04-09',
10
'2018-02-11',
11
'2018-05-12',
12
'2019-05-10',
13
]
14
15
16
customer = [1,1,1,1,1,1,1,2,2,2,2,2]
17
18
daily_total = [100,50,80,180,0,40,500,50,230,80,0,0]
19
20
21
df = pd.DataFrame({'date': dates,
22
'customer': customer,
23
'daily_total_per_customer':daily_total,})
24
Perhaps someone can point me in the right direction. Thanks!
Advertisement
Answer
Annotated code
JavaScript
1
13
13
1
# Parse the strings to datetime
2
df['date'] = pd.to_datetime(df['date'])
3
4
# Sort the dates in ASC order if not already sorted
5
df = df.sort_values(['customer', 'date'])
6
7
# Group the dataframe by customer then for each group
8
# calculate rolling sum on 'daily_total_per_customer'
9
s = df.groupby('customer').rolling('365d', on='date')['daily_total_per_customer'].sum()
10
11
# Merge the result with original df
12
df.merge(s.reset_index(name='rolling_total'))
13
JavaScript
1
14
14
1
date customer daily_total_per_customer rolling_total
2
0 2016-07-29 1 100 100
3
1 2016-08-01 1 50 150
4
2 2017-01-12 1 80 230
5
3 2017-10-23 1 180 260
6
4 2018-03-03 1 0 180
7
5 2018-03-06 1 40 220
8
6 2019-03-16 1 500 500
9
7 2017-04-07 2 50 50
10
8 2017-04-09 2 230 280
11
9 2018-02-11 2 80 360
12
10 2018-05-12 2 0 80
13
11 2019-05-10 2 0 0
14