I have a dataframe like this:
JavaScript
x
11
11
1
Product_ID Quantity Year Quarter
2
1 100 2021 1
3
1 100 2021 2
4
1 50 2021 3
5
1 100 2021 4
6
1 100 2022 1
7
2 100 2021 1
8
2 100 2021 2
9
3 100 2021 1
10
3 100 2021 2
11
I would like to get the Sum of the last three months (excluding the current month), per Product_ID.
Therefore I tried this:
JavaScript
1
8
1
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
2
.rolling(3).sum().reset_index(0,drop=True)
3
)
4
5
# Shifting 1, because I want to exclude the current row.
6
# Rolling 3, because I want to have the 3 'rows' before
7
# Grouping by, because I want to have the calculation PER product
8
My code is failing, because it does not only calculate it per product, but it will give me also numbers for other products (let’s say Product 2, quarter 1: gives me the 3 rows from product 1).
My proposed outcome:
JavaScript
1
11
11
1
Product_ID Quantity Year Quarter Qty_Sum_3qrts
2
1 100 2021 1 0 # because we dont historical data for this id
3
1 100 2021 2 100 # sum of last month of this product
4
1 50 2021 3 200 # sum of last 2 months of this product
5
1 100 2021 4 250 # sum of last 3 months of this product
6
1 100 2022 1 250 # sum of last 3 months of this product
7
2 100 2021 1 0 # because we dont have hist data for this id
8
2 100 2021 2 100 # sum of last month of this product
9
3 100 2021 1 0 # etc
10
3 100 2021 2 100 # etc
11
Advertisement
Answer
You need to apply the rolling sum per group, you can use apply
for this:
JavaScript
1
5
1
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
2
.apply(lambda s: s.shift(1,fill_value=0)
3
.rolling(3, min_periods=1).sum())
4
)
5
output:
JavaScript
1
11
11
1
Product_ID Quantity Year Quarter Qty_Sum_3qrts
2
0 1 100 2021 1 0.0
3
1 1 100 2021 2 100.0
4
2 1 50 2021 3 200.0
5
3 1 100 2021 4 250.0
6
4 1 100 2022 1 250.0
7
5 2 100 2021 1 0.0
8
6 2 100 2021 2 100.0
9
7 3 100 2021 1 0.0
10
8 3 100 2021 2 100.0
11