Skip to content
Advertisement

Getting Rolling Sum per Group

I have a dataframe like this:

Product_ID    Quantity    Year    Quarter   
  1             100       2021      1          
  1             100       2021      2         
  1              50       2021      3          
  1             100       2021      4          
  1             100       2022      1         
  2             100       2021      1          
  2             100       2021      2          
  3             100       2021      1          
  3             100       2021      2         

I would like to get the Sum of the last three months (excluding the current month), per Product_ID.

Therefore I tried this:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
                         .rolling(3).sum().reset_index(0,drop=True)
                       )

# Shifting 1, because I want to exclude the current row. 
# Rolling 3, because I want to have the 3 'rows' before 
# Grouping by, because I want to have the calculation PER product 

My code is failing, because it does not only calculate it per product, but it will give me also numbers for other products (let’s say Product 2, quarter 1: gives me the 3 rows from product 1).

My proposed outcome:

Product_ID    Quantity    Year    Quarter   Qty_Sum_3qrts
  1             100       2021      1          0 # because we dont historical data for this id
  1             100       2021      2          100 # sum of last month of this product 
  1              50       2021      3          200 # sum of last 2 months of this product
  1             100       2021      4          250 # sum of last 3 months of this product
  1             100       2022      1          250 # sum of last 3 months of this product
  2             100       2021      1          0  # because we dont have hist data for this id
  2             100       2021      2          100 # sum of last month of this product
  3             100       2021      1          0   # etc
  3             100       2021      2          100  # etc 

Advertisement

Answer

You need to apply the rolling sum per group, you can use apply for this:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
                         .apply(lambda s: s.shift(1,fill_value=0)
                                           .rolling(3, min_periods=1).sum())
                       )

output:

   Product_ID  Quantity  Year  Quarter  Qty_Sum_3qrts
0           1       100  2021        1            0.0
1           1       100  2021        2          100.0
2           1        50  2021        3          200.0
3           1       100  2021        4          250.0
4           1       100  2022        1          250.0
5           2       100  2021        1            0.0
6           2       100  2021        2          100.0
7           3       100  2021        1            0.0
8           3       100  2021        2          100.0
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement