Skip to content
Advertisement

Calculate 14-day rolling average on data with two hierarchies

I am trying to calculate the 14 day rolling average for retail data with multiple different hierarchies. The ‘Store’ dataframe looks like this:

Store  |  Inventory-Small  |  Inventory-Medium   |   Date  |   Purchases-Small  |   Purchases-Medium
-----------------------------------------------------------------------------------------------------
  A            12                   14              4/1/20         2                      4
  B            13                   16              4/1/20         4                      5
  A            15                   10              4/2/20         2                      6
  C            20                   15              4/1/20         4                      5 
  A            16                    8              4/3/20         2                      4 
  A            16                   10              4/4/20         4                      5
  A            15                   12              4/5/20         1                      3 
  C            18                   14              4/2/20         2                      3
  C            19                   12              4/3/20         6                      9
  B            14                   14              4/2/20         3                      8

What I am trying to do is create a rolling 14 day average for the purchases column for each store. The data extends well past 14 day (over 8 months), and I would like the first 14 days of each store to be a simple average. My issue is that while I can group by ‘Store’ and create a column, I don’t know how to also group by dates. I’ve tried:

Store.sort_values(['Store','Date'],ascending=(False,False))
Store['Rolling_Purchase_S'] = Store.groupby(['Store','Date'], as_index=False)['Purchases-Small'].transform(lambda x: x.rolling(14, 1).mean())

and also:

Store['Rolling_Purchase_S'] = Store.groupby('Store')['Purchases-Small'].transform(lambda x: x.rolling(14, 1).mean())

The first one doesn’t seem to have any effect while the second one doesn’t group by dates so I end up with a rolling average in the wrong order. Any advice would be much appreciated!

Edit: The following lines worked, thanks to all for the feedback.

Store.sort_values(['Store','Date'],ascending=(False,True),inplace=True)
Store['Rolling_Purchase_S'] = Store.groupby('Store')['Purchases-Small'].transform(lambda x: x.rolling(14, 1).mean())

Advertisement

Answer

I believe it’s working fine as long as you sort inplace and remove ‘Date’ from groupby:

Store.sort_values(['Store','Date'], ascending=(False,False), inplace=True)
Store['Rolling_Purchase_S'] = Store.groupby(['Store'])['Purchases-Small'].transform(lambda x: x.rolling(14, 1).mean())

Output:

print(Store[['Store', 'Date', 'Purchases-Small', 'Rolling_Purchase_S']])
  Store       Date  Purchases-Small  Rolling_Purchase_S
8     C 2020-04-03                6            6.000000
7     C 2020-04-02                2            4.000000
3     C 2020-04-01                4            4.000000
9     B 2020-04-02                3            3.000000
1     B 2020-04-01                4            3.500000
6     A 2020-04-05                1            1.000000
5     A 2020-04-04                4            2.500000
4     A 2020-04-03                2            2.333333
2     A 2020-04-02                2            2.250000
0     A 2020-04-01                2            2.200000
Advertisement