I am trying to calculate the 12 month rolling sum for the number of orders and revenue based on a person’s name using Python for the following dataframe:
df = pd.DataFrame( columns = ['Name','Month','Orders','Revenue']) df['Name'] = ['Ava','Ava','Ava','Baptiste','Baptiste','Baptiste','Baptiste','Cam','Cam'] df['Month'] =['2019-07','2020-04','2021-03','2020-03','2020-10','2021-03','2021-05','2020-01','2021-03'] df['Orders'] =[2,12,4,5,9,10,3,4,5] df['Revenue'] = [10,25,15,5,8,10,12,20,5]
In order to give the following output:
df['Rolling 12 Month Orders'] = [2,14,16,5,14,24,22,4,5] df['Rolling 12 Month Revenue'] = [10,35,40,5,13,23,30,20,5]
The rolling sum should add up all the totals in the past 12 months grouped by the name column.
I have tried the following:
df['Rolling 12 Month Orders'] = df.groupby(['Name','Month'])['Orders'].rolling(12).sum()
but it does not give me the expected result.
Advertisement
Answer
You’re passing the rolling
frequency as 12, pandas does not know that you want to specify a 12 months window, also you need to make sure that your Month
column is identified as a date type column, try this:
df['Month'] = pd.to_datetime(df['Month']) df['Rolling 12 Month Orders'] = df.groupby(['Name']).rolling('365d', on = 'Month')['Orders'].sum().reset_index( drop = True) df['Rolling 12 Month Revenue'] = df.groupby(['Name']).rolling('365d', on = 'Month')['Revenue'].sum().reset_index( drop = True)
Also, you need to use 365d
instead of 12m
, since using 12m
would throw an error : ValueError: <12 * MonthEnds> is a non-fixed frequency
.
A brief explanation for this is that a rolling window must have a fixed width – “a month” does not have a fixed number of days.