I have a dataframe like this:
Product_ID Quantity Year Quarter 1 100 2021 1 1 100 2021 2 1 50 2021 3 1 100 2021 4 1 100 2022 1 2 100 2021 1 2 100 2021 2 3 100 2021 1 3 100 2021 2
I would like to get the Sum of the last three months (excluding the current month), per Product_ID.
Therefore I tried this:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0) .rolling(3).sum().reset_index(0,drop=True) ) # Shifting 1, because I want to exclude the current row. # Rolling 3, because I want to have the 3 'rows' before # Grouping by, because I want to have the calculation PER product
My code is failing, because it does not only calculate it per product, but it will give me also numbers for other products (let’s say Product 2, quarter 1: gives me the 3 rows from product 1).
My proposed outcome:
Product_ID Quantity Year Quarter Qty_Sum_3qrts 1 100 2021 1 0 # because we dont historical data for this id 1 100 2021 2 100 # sum of last month of this product 1 50 2021 3 200 # sum of last 2 months of this product 1 100 2021 4 250 # sum of last 3 months of this product 1 100 2022 1 250 # sum of last 3 months of this product 2 100 2021 1 0 # because we dont have hist data for this id 2 100 2021 2 100 # sum of last month of this product 3 100 2021 1 0 # etc 3 100 2021 2 100 # etc
Advertisement
Answer
You need to apply the rolling sum per group, you can use apply
for this:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity'] .apply(lambda s: s.shift(1,fill_value=0) .rolling(3, min_periods=1).sum()) )
output:
Product_ID Quantity Year Quarter Qty_Sum_3qrts 0 1 100 2021 1 0.0 1 1 100 2021 2 100.0 2 1 50 2021 3 200.0 3 1 100 2021 4 250.0 4 1 100 2022 1 250.0 5 2 100 2021 1 0.0 6 2 100 2021 2 100.0 7 3 100 2021 1 0.0 8 3 100 2021 2 100.0