Skip to content
Advertisement

Create rolling average pandas

I have a dataset of esports data like this:

(done using pd.to_clipboard()

    Week    Team    Vs  Team Points Vs Points
0   1   Team1   Team2   94  67
1   1   Team3   Team4   51  83
2   1   Team5   Team6   74  96
3   1   Team2   Team1   67  94
4   1   Team4   Team3   83  51
5   1   Team6   Team5   96  74
6   2   Team2   Team6   63  76
7   2   Team1   Team3   90  84
8   2   Team4   Team5   68  56
9   2   Team6   Team2   76  63
10  2   Team3   Team1   84  90
11  2   Team5   Team4   56  68
12  3   Team5   Team2   63  74
13  3   Team1   Team6   63  99
14  3   Team3   Team4   50  75
15  3   Team2   Team5   74  63
16  3   Team6   Team1   99  63
17  3   Team4   Team3   75  50
18  4   Team6   Team4   78  91
19  4   Team5   Team1   70  71
20  4   Team2   Team3   99  74
21  4   Team4   Team6   91  78
22  4   Team1   Team5   71  70
23  4   Team6   Team4   78  91

I want to create a dataframe that essentially, for each team, every week, creates a rolling X game average of their points scored. (X could be 2, 3, 4, etc). A few notes:

  • This example only shows points, the actual data has about 10 features that need rolling averages
  • Ideally, it would be a function, so I could swap X for any given number
  • Not every team plays every week!

I tried to create this functionality using below code:

import pandas as pd
x = pd.read_clipboard()
teams = list(set(x['Team'].to_list() + (x['Vs'].to_list())))
res = []
for team in teams:
  temp = x.loc[x['Team']==team]
  temp.sort_values(by=['Week'], inplace=True)
  temp.groupby(['Week']).rolling(2).mean()
  res.append(temp)
res_df = pd.concat(res)

However, it yields results like:

Week    Team    Vs  Team Points Vs Points
1   1   Team3   Team4   51  83
10  2   Team3   Team1   84  90
14  3   Team3   Team4   50  75
4   1   Team4   Team3   83  51
8   2   Team4   Team5   68  56
17  3   Team4   Team3   75  50
21  4   Team4   Team6   91  78
2   1   Team5   Team6   74  96
11  2   Team5   Team4   56  68
12  3   Team5   Team2   63  74
19  4   Team5   Team1   70  71
5   1   Team6   Team5   96  74
9   2   Team6   Team2   76  63
16  3   Team6   Team1   99  63
18  4   Team6   Team4   78  91
23  4   Team6   Team4   78  91
0   1   Team1   Team2   94  67
7   2   Team1   Team3   90  84
13  3   Team1   Team6   63  99
22  4   Team1   Team5   71  70
3   1   Team2   Team1   67  94
6   2   Team2   Team6   63  76
15  3   Team2   Team5   74  63
20  4   Team2   Team3   99  74

While I would like for it to look like:

    Week    Team    2Game_Average_Points
0   1   Team1   
1   1   Team2   
2   1   Team3   
3   1   Team4   
4   1   Team5   
5   1   Team6   
6   2   Team1   94.0
7   2   Team2   67.0
8   2   Team3   51.0
9   2   Team4   83.0
10  2   Team5   74.0
11  2   Team6   96.0
12  3   Team1   92.0
13  3   Team2   65.0
14  3   Team3   67.5
15  3   Team4   78.5
16  3   Team5   65.0
17  3   Team6   86.0
18  4   Team1   76.5
19  4   Team2   68.5
20  4   Team3   67.5
21  4   Team4   78.5
22  4   Team5   59.5
23  4   Team6   87.5

What am I doing wrong? How can I achieve my desired result?

Advertisement

Answer

>>> roll = df.groupby('Team')[['Team Points']].apply(lambda x: x.rolling(2).mean())
>>> df[['Week', 'Team']].join(roll.rename('2Game_Average_Points')
    Week   Team  2Game_Average_Points
0      1  Team1          NaN
1      1  Team3          NaN
2      1  Team5          NaN
3      1  Team2          NaN
4      1  Team4          NaN
5      1  Team6          NaN
6      2  Team2         65.0
7      2  Team1         92.0
8      2  Team4         75.5
9      2  Team6         86.0
10     2  Team3         67.5
11     2  Team5         65.0
12     3  Team5         59.5
13     3  Team1         76.5
14     3  Team3         67.0
15     3  Team2         68.5
16     3  Team6         87.5
17     3  Team4         71.5
18     4  Team6         88.5
19     4  Team5         66.5
20     4  Team2         86.5
21     4  Team4         83.0
22     4  Team1         67.0
23     4  Team6         78.0
  • grouping by team
  • selecting columns to be averaged (here Team Points, not using Vs Points since every game appears twice, but you could add your other features to the list)
  • use df.rolling() with a window of 2. The number only appears once so easy to change or use as parameter in a function.

As underlined in another answer this could use the slightly denser notation groupby().rolling(), thus in this case:

>>> roll = df.groupby('Team', as_index=False)[['Team Points']].rolling(2).mean()
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement