I have a dataset of esports data like this:
(done using pd.to_clipboard()
Week Team Vs Team Points Vs Points 0 1 Team1 Team2 94 67 1 1 Team3 Team4 51 83 2 1 Team5 Team6 74 96 3 1 Team2 Team1 67 94 4 1 Team4 Team3 83 51 5 1 Team6 Team5 96 74 6 2 Team2 Team6 63 76 7 2 Team1 Team3 90 84 8 2 Team4 Team5 68 56 9 2 Team6 Team2 76 63 10 2 Team3 Team1 84 90 11 2 Team5 Team4 56 68 12 3 Team5 Team2 63 74 13 3 Team1 Team6 63 99 14 3 Team3 Team4 50 75 15 3 Team2 Team5 74 63 16 3 Team6 Team1 99 63 17 3 Team4 Team3 75 50 18 4 Team6 Team4 78 91 19 4 Team5 Team1 70 71 20 4 Team2 Team3 99 74 21 4 Team4 Team6 91 78 22 4 Team1 Team5 71 70 23 4 Team6 Team4 78 91
I want to create a dataframe that essentially, for each team, every week, creates a rolling X game average of their points scored. (X could be 2, 3, 4, etc). A few notes:
- This example only shows points, the actual data has about 10 features that need rolling averages
- Ideally, it would be a function, so I could swap X for any given number
- Not every team plays every week!
I tried to create this functionality using below code:
import pandas as pd x = pd.read_clipboard() teams = list(set(x['Team'].to_list() + (x['Vs'].to_list()))) res = [] for team in teams: temp = x.loc[x['Team']==team] temp.sort_values(by=['Week'], inplace=True) temp.groupby(['Week']).rolling(2).mean() res.append(temp) res_df = pd.concat(res)
However, it yields results like:
Week Team Vs Team Points Vs Points 1 1 Team3 Team4 51 83 10 2 Team3 Team1 84 90 14 3 Team3 Team4 50 75 4 1 Team4 Team3 83 51 8 2 Team4 Team5 68 56 17 3 Team4 Team3 75 50 21 4 Team4 Team6 91 78 2 1 Team5 Team6 74 96 11 2 Team5 Team4 56 68 12 3 Team5 Team2 63 74 19 4 Team5 Team1 70 71 5 1 Team6 Team5 96 74 9 2 Team6 Team2 76 63 16 3 Team6 Team1 99 63 18 4 Team6 Team4 78 91 23 4 Team6 Team4 78 91 0 1 Team1 Team2 94 67 7 2 Team1 Team3 90 84 13 3 Team1 Team6 63 99 22 4 Team1 Team5 71 70 3 1 Team2 Team1 67 94 6 2 Team2 Team6 63 76 15 3 Team2 Team5 74 63 20 4 Team2 Team3 99 74
While I would like for it to look like:
Week Team 2Game_Average_Points 0 1 Team1 1 1 Team2 2 1 Team3 3 1 Team4 4 1 Team5 5 1 Team6 6 2 Team1 94.0 7 2 Team2 67.0 8 2 Team3 51.0 9 2 Team4 83.0 10 2 Team5 74.0 11 2 Team6 96.0 12 3 Team1 92.0 13 3 Team2 65.0 14 3 Team3 67.5 15 3 Team4 78.5 16 3 Team5 65.0 17 3 Team6 86.0 18 4 Team1 76.5 19 4 Team2 68.5 20 4 Team3 67.5 21 4 Team4 78.5 22 4 Team5 59.5 23 4 Team6 87.5
What am I doing wrong? How can I achieve my desired result?
Advertisement
Answer
>>> roll = df.groupby('Team')[['Team Points']].apply(lambda x: x.rolling(2).mean()) >>> df[['Week', 'Team']].join(roll.rename('2Game_Average_Points') Week Team 2Game_Average_Points 0 1 Team1 NaN 1 1 Team3 NaN 2 1 Team5 NaN 3 1 Team2 NaN 4 1 Team4 NaN 5 1 Team6 NaN 6 2 Team2 65.0 7 2 Team1 92.0 8 2 Team4 75.5 9 2 Team6 86.0 10 2 Team3 67.5 11 2 Team5 65.0 12 3 Team5 59.5 13 3 Team1 76.5 14 3 Team3 67.0 15 3 Team2 68.5 16 3 Team6 87.5 17 3 Team4 71.5 18 4 Team6 88.5 19 4 Team5 66.5 20 4 Team2 86.5 21 4 Team4 83.0 22 4 Team1 67.0 23 4 Team6 78.0
- grouping by team
- selecting columns to be averaged (here
Team Points
, not usingVs Points
since every game appears twice, but you could add your other features to the list) - use df.rolling() with a window of 2. The number only appears once so easy to change or use as parameter in a function.
As underlined in another answer this could use the slightly denser notation groupby().rolling()
, thus in this case:
>>> roll = df.groupby('Team', as_index=False)[['Team Points']].rolling(2).mean()