I'd like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows: This results in the following DataFrame: Now I'd like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum

Pandas lagged rolling average on aggregate data with multiple groups and missing dates

I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows:

import numpy as np
import pandas as pd

np.random.seed(101)

fruit = ['apples', 'apples', 'apples', 'oranges', 'apples', 'oranges', 'oranges',
         'oranges', 'apples', 'oranges', 'apples', 'apples']
people = ['alice']*6+['bob']*6
date = ['2022-01-01', '2022-01-03', '2022-01-04', '2022-01-04', '2022-01-11', '2022-01-11',
         '2022-01-04', '2022-01-05', '2022-01-05', '2022-01-20', '2022-01-20', '2022-01-25']
count = np.random.poisson(4,size=12)
weight_per = np.round(np.random.uniform(1,3,size=12),2)

df = pd.DataFrame({'date':date, 'people':people, 'fruit':fruit,
                   'count':count, 'weight':weight_per*count})
df['date'] = pd.to_datetime(df.date)

JavaScript
​x
 
import numpy as np
import pandas as pd
​
np.random.seed(101)
​
fruit = ['apples', 'apples', 'apples', 'oranges', 'apples', 'oranges', 'oranges',
         'oranges', 'apples', 'oranges', 'apples', 'apples']
people = ['alice']*6+['bob']*6
date = ['2022-01-01', '2022-01-03', '2022-01-04', '2022-01-04', '2022-01-11', '2022-01-11',
         '2022-01-04', '2022-01-05', '2022-01-05', '2022-01-20', '2022-01-20', '2022-01-25']
count = np.random.poisson(4,size=12)
weight_per = np.round(np.random.uniform(1,3,size=12),2)
​
df = pd.DataFrame({'date':date, 'people':people, 'fruit':fruit,
                   'count':count, 'weight':weight_per*count})
df['date'] = pd.to_datetime(df.date)
​
​

This results in the following DataFrame:

    date        people  fruit   count   weight
0   2022-01-01  alice   apples  2       2.72
1   2022-01-03  alice   apples  6       11.28
2   2022-01-04  alice   apples  5       13.80
3   2022-01-04  alice   oranges 3       8.70
4   2022-01-11  alice   apples  2       3.92
5   2022-01-11  alice   oranges 3       5.76
6   2022-01-04  bob     oranges 8       18.16
7   2022-01-05  bob     oranges 5       8.25
8   2022-01-05  bob     apples  5       6.20
9   2022-01-20  bob     oranges 4       4.40
10  2022-01-20  bob     apples  2       4.56
11  2022-01-25  bob     apples  2       5.24

JavaScript
 
    date        people  fruit   count   weight
 2022-01-01  alice   apples  2       2.72
 2022-01-03  alice   apples  6       11.28
 2022-01-04  alice   apples  5       13.80
 2022-01-04  alice   oranges 3       8.70
 2022-01-11  alice   apples  2       3.92
 2022-01-11  alice   oranges 3       5.76
 2022-01-04  bob     oranges 8       18.16
 2022-01-05  bob     oranges 5       8.25
 2022-01-05  bob     apples  5       6.20
 2022-01-20  bob     oranges 4       4.40
2022-01-20  bob     apples  2       4.56
2022-01-25  bob     apples  2       5.24
​

Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum of all the fruit counts for the past 7 days, not including the current day. While there are many ways to brute force this answer, I’m looking for something with relatively good time complexity. If I were to calculate this column by hand, these would be the calculations and expected results:

df['wgt_per_frt_prev_7d'] = np.nan

df.loc[1, 'wgt_per_frt_prev_7d'] = 2.72/2 # row 0

df.loc[2, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6) # row 0 and 1
df.loc[3, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6)

df.loc[4, 'wgt_per_frt_prev_7d'] = (8.70+13.80+6.20+8.25+18.16)/(3+5+5+5+8) # row 2,3,6,7,8
df.loc[5, 'wgt_per_frt_prev_7d'] = (8.70+13.80+6.20+8.25+18.16)/(3+5+5+5+8)

df.loc[6, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6) # row 0,1

df.loc[7, 'wgt_per_frt_prev_7d'] = (8.70+13.80+2.72+11.28+18.16)/(3+5+6+2+8) # row 0,1,2,3,6
df.loc[8, 'wgt_per_frt_prev_7d'] = (8.70+13.80+2.72+11.28+18.16)/(3+5+6+2+8)

df.loc[11, 'wgt_per_frt_prev_7d'] = (4.40+4.56)/(2+4) # row 9,10

JavaScript
 
df['wgt_per_frt_prev_7d'] = np.nan
​
df.loc[1, 'wgt_per_frt_prev_7d'] = 2.72/2 # row 0
​
df.loc[2, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6) # row 0 and 1
df.loc[3, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6)
​
df.loc[4, 'wgt_per_frt_prev_7d'] = (8.70+13.80+6.20+8.25+18.16)/(3+5+5+5+8) # row 2,3,6,7,8
df.loc[5, 'wgt_per_frt_prev_7d'] = (8.70+13.80+6.20+8.25+18.16)/(3+5+5+5+8)
​
df.loc[6, 'wgt_per_frt_prev_7d'] = (2.72+11.28)/(2+6) # row 0,1
​
df.loc[7, 'wgt_per_frt_prev_7d'] = (8.70+13.80+2.72+11.28+18.16)/(3+5+6+2+8) # row 0,1,2,3,6
df.loc[8, 'wgt_per_frt_prev_7d'] = (8.70+13.80+2.72+11.28+18.16)/(3+5+6+2+8)
​
df.loc[11, 'wgt_per_frt_prev_7d'] = (4.40+4.56)/(2+4) # row 9,10
​

Final DF:

    date        people  fruit   count   weight  wgt_per_frt_prev_7d
0   2022-01-01  alice   apples  2       2.72    NaN
1   2022-01-03  alice   apples  6       11.28   1.360000
2   2022-01-04  alice   apples  5       13.80   1.750000
3   2022-01-04  alice   oranges 3       8.70    1.750000
4   2022-01-11  alice   apples  2       3.92    2.119615
5   2022-01-11  alice   oranges 3       5.76    2.119615
6   2022-01-04  bob     oranges 8       18.16   1.750000
7   2022-01-05  bob     oranges 5       8.25    2.277500
8   2022-01-05  bob     apples  5       6.20    2.277500
9   2022-01-20  bob     oranges 4       4.40    NaN
10  2022-01-20  bob     apples  2       4.56    NaN
11  2022-01-25  bob     apples  2       5.24    1.493333

JavaScript
 
    date        people  fruit   count   weight  wgt_per_frt_prev_7d
 2022-01-01  alice   apples  2       2.72    NaN
 2022-01-03  alice   apples  6       11.28   1.360000
 2022-01-04  alice   apples  5       13.80   1.750000
 2022-01-04  alice   oranges 3       8.70    1.750000
 2022-01-11  alice   apples  2       3.92    2.119615
 2022-01-11  alice   oranges 3       5.76    2.119615
 2022-01-04  bob     oranges 8       18.16   1.750000
 2022-01-05  bob     oranges 5       8.25    2.277500
 2022-01-05  bob     apples  5       6.20    2.277500
 2022-01-20  bob     oranges 4       4.40    NaN
2022-01-20  bob     apples  2       4.56    NaN
2022-01-25  bob     apples  2       5.24    1.493333
​

EDIT

The final column I’d like to add is wgt_per_apl_prev_7d, which only considers the apple weights when calculating this field, but still applies to all rows, even rows with just oranges. The output of this calculation should be as follows:

    date        people  fruit   count   weight  wgt_per_frt_prev_7d wgt_per_apl_prev_7d
0   2022-01-01  alice   apples  2       2.72    NaN                NaN
1   2022-01-03  alice   apples  6       11.28   1.360000      1.360000
2   2022-01-04  alice   apples  5       13.80   1.750000      1.750000
3   2022-01-04  alice   oranges 3       8.70    1.750000      1.750000
4   2022-01-11  alice   apples  2       3.92    2.119615      2.000000
5   2022-01-11  alice   oranges 3       5.76    2.119615      2.000000
6   2022-01-04  bob     oranges 8       18.16   1.750000      1.750000
7   2022-01-05  bob     oranges 5       8.25    2.277500      2.138462
8   2022-01-05  bob     apples  5       6.20    2.277500      2.138462
9   2022-01-20  bob     oranges 4       4.40    NaN                NaN
10  2022-01-20  bob     apples  2       4.56    NaN                NaN
11  2022-01-25  bob     apples  2       5.24    1.493333      2.280000

JavaScript
 
    date        people  fruit   count   weight  wgt_per_frt_prev_7d wgt_per_apl_prev_7d
 2022-01-01  alice   apples  2       2.72    NaN                NaN
 2022-01-03  alice   apples  6       11.28   1.360000      1.360000
 2022-01-04  alice   apples  5       13.80   1.750000      1.750000
 2022-01-04  alice   oranges 3       8.70    1.750000      1.750000
 2022-01-11  alice   apples  2       3.92    2.119615      2.000000
 2022-01-11  alice   oranges 3       5.76    2.119615      2.000000
 2022-01-04  bob     oranges 8       18.16   1.750000      1.750000
 2022-01-05  bob     oranges 5       8.25    2.277500      2.138462
 2022-01-05  bob     apples  5       6.20    2.277500      2.138462
 2022-01-20  bob     oranges 4       4.40    NaN                NaN
2022-01-20  bob     apples  2       4.56    NaN                NaN
2022-01-25  bob     apples  2       5.24    1.493333      2.280000
​

Answer

Try this

df2 = df[['date', 'count', 'weight']].groupby('date').sum()
df2 = df2.rolling('8D').apply(np.sum, raw=True) - df2
df = df.merge((df2['weight']/df2['count']).rename('avg').to_frame().reset_index(), on='date', how='left')

df2 = df[df['fruit'] == 'apples'][['date', 'count', 'weight']].groupby('date').sum()
df2 = df2.rolling('8D').apply(np.sum, raw=True) - df2
df = df.merge((df2['weight']/df2['count']).rename('avg_apple').to_frame().reset_index(), on='date', how='left')

JavaScript
 
df2 = df[['date', 'count', 'weight']].groupby('date').sum()
df2 = df2.rolling('8D').apply(np.sum, raw=True) - df2
df = df.merge((df2['weight']/df2['count']).rename('avg').to_frame().reset_index(), on='date', how='left')
​
df2 = df[df['fruit'] == 'apples'][['date', 'count', 'weight']].groupby('date').sum()
df2 = df2.rolling('8D').apply(np.sum, raw=True) - df2
df = df.merge((df2['weight']/df2['count']).rename('avg_apple').to_frame().reset_index(), on='date', how='left')
​

Output

    date        people  fruit   count   weight  avg        avg_apple
0   2022-01-01  alice   apples  2       2.72    NaN        NaN
1   2022-01-03  alice   apples  6       11.28   1.360000   1.360000
2   2022-01-04  alice   apples  5       13.80   1.750000   1.750000
3   2022-01-04  alice   oranges 3       8.70    1.750000   1.750000
4   2022-01-11  alice   apples  2       3.92    2.119615   2.000000
5   2022-01-11  alice   oranges 3       5.76    2.119615   2.000000
6   2022-01-04  bob     oranges 8       18.16   1.750000   1.750000
7   2022-01-05  bob     oranges 5       8.25    2.277500   2.138462
8   2022-01-05  bob     apples  5       6.20    2.277500   2.138462
9   2022-01-20  bob     oranges 4       4.40    NaN        NaN
10  2022-01-20  bob     apples  2       4.56    NaN        NaN
11  2022-01-25  bob     apples  2       5.24    1.493333   2.280000

JavaScript
 
    date        people  fruit   count   weight  avg        avg_apple
 2022-01-01  alice   apples  2       2.72    NaN        NaN
 2022-01-03  alice   apples  6       11.28   1.360000   1.360000
 2022-01-04  alice   apples  5       13.80   1.750000   1.750000
 2022-01-04  alice   oranges 3       8.70    1.750000   1.750000
 2022-01-11  alice   apples  2       3.92    2.119615   2.000000
 2022-01-11  alice   oranges 3       5.76    2.119615   2.000000
 2022-01-04  bob     oranges 8       18.16   1.750000   1.750000
 2022-01-05  bob     oranges 5       8.25    2.277500   2.138462
 2022-01-05  bob     apples  5       6.20    2.277500   2.138462
 2022-01-20  bob     oranges 4       4.40    NaN        NaN
2022-01-20  bob     apples  2       4.56    NaN        NaN
2022-01-25  bob     apples  2       5.24    1.493333   2.280000
​

Advertisement

Answer