I have a dataset of esports data like this:
(done using pd.to_clipboard()
JavaScript
x
26
26
1
Week Team Vs Team Points Vs Points
2
0 1 Team1 Team2 94 67
3
1 1 Team3 Team4 51 83
4
2 1 Team5 Team6 74 96
5
3 1 Team2 Team1 67 94
6
4 1 Team4 Team3 83 51
7
5 1 Team6 Team5 96 74
8
6 2 Team2 Team6 63 76
9
7 2 Team1 Team3 90 84
10
8 2 Team4 Team5 68 56
11
9 2 Team6 Team2 76 63
12
10 2 Team3 Team1 84 90
13
11 2 Team5 Team4 56 68
14
12 3 Team5 Team2 63 74
15
13 3 Team1 Team6 63 99
16
14 3 Team3 Team4 50 75
17
15 3 Team2 Team5 74 63
18
16 3 Team6 Team1 99 63
19
17 3 Team4 Team3 75 50
20
18 4 Team6 Team4 78 91
21
19 4 Team5 Team1 70 71
22
20 4 Team2 Team3 99 74
23
21 4 Team4 Team6 91 78
24
22 4 Team1 Team5 71 70
25
23 4 Team6 Team4 78 91
26
I want to create a dataframe that essentially, for each team, every week, creates a rolling X game average of their points scored. (X could be 2, 3, 4, etc). A few notes:
- This example only shows points, the actual data has about 10 features that need rolling averages
- Ideally, it would be a function, so I could swap X for any given number
- Not every team plays every week!
I tried to create this functionality using below code:
JavaScript
1
11
11
1
import pandas as pd
2
x = pd.read_clipboard()
3
teams = list(set(x['Team'].to_list() + (x['Vs'].to_list())))
4
res = []
5
for team in teams:
6
temp = x.loc[x['Team']==team]
7
temp.sort_values(by=['Week'], inplace=True)
8
temp.groupby(['Week']).rolling(2).mean()
9
res.append(temp)
10
res_df = pd.concat(res)
11
However, it yields results like:
JavaScript
1
26
26
1
Week Team Vs Team Points Vs Points
2
1 1 Team3 Team4 51 83
3
10 2 Team3 Team1 84 90
4
14 3 Team3 Team4 50 75
5
4 1 Team4 Team3 83 51
6
8 2 Team4 Team5 68 56
7
17 3 Team4 Team3 75 50
8
21 4 Team4 Team6 91 78
9
2 1 Team5 Team6 74 96
10
11 2 Team5 Team4 56 68
11
12 3 Team5 Team2 63 74
12
19 4 Team5 Team1 70 71
13
5 1 Team6 Team5 96 74
14
9 2 Team6 Team2 76 63
15
16 3 Team6 Team1 99 63
16
18 4 Team6 Team4 78 91
17
23 4 Team6 Team4 78 91
18
0 1 Team1 Team2 94 67
19
7 2 Team1 Team3 90 84
20
13 3 Team1 Team6 63 99
21
22 4 Team1 Team5 71 70
22
3 1 Team2 Team1 67 94
23
6 2 Team2 Team6 63 76
24
15 3 Team2 Team5 74 63
25
20 4 Team2 Team3 99 74
26
While I would like for it to look like:
JavaScript
1
26
26
1
Week Team 2Game_Average_Points
2
0 1 Team1
3
1 1 Team2
4
2 1 Team3
5
3 1 Team4
6
4 1 Team5
7
5 1 Team6
8
6 2 Team1 94.0
9
7 2 Team2 67.0
10
8 2 Team3 51.0
11
9 2 Team4 83.0
12
10 2 Team5 74.0
13
11 2 Team6 96.0
14
12 3 Team1 92.0
15
13 3 Team2 65.0
16
14 3 Team3 67.5
17
15 3 Team4 78.5
18
16 3 Team5 65.0
19
17 3 Team6 86.0
20
18 4 Team1 76.5
21
19 4 Team2 68.5
22
20 4 Team3 67.5
23
21 4 Team4 78.5
24
22 4 Team5 59.5
25
23 4 Team6 87.5
26
What am I doing wrong? How can I achieve my desired result?
Advertisement
Answer
JavaScript
1
28
28
1
>>> roll = df.groupby('Team')[['Team Points']].apply(lambda x: x.rolling(2).mean())
2
>>> df[['Week', 'Team']].join(roll.rename('2Game_Average_Points')
3
Week Team 2Game_Average_Points
4
0 1 Team1 NaN
5
1 1 Team3 NaN
6
2 1 Team5 NaN
7
3 1 Team2 NaN
8
4 1 Team4 NaN
9
5 1 Team6 NaN
10
6 2 Team2 65.0
11
7 2 Team1 92.0
12
8 2 Team4 75.5
13
9 2 Team6 86.0
14
10 2 Team3 67.5
15
11 2 Team5 65.0
16
12 3 Team5 59.5
17
13 3 Team1 76.5
18
14 3 Team3 67.0
19
15 3 Team2 68.5
20
16 3 Team6 87.5
21
17 3 Team4 71.5
22
18 4 Team6 88.5
23
19 4 Team5 66.5
24
20 4 Team2 86.5
25
21 4 Team4 83.0
26
22 4 Team1 67.0
27
23 4 Team6 78.0
28
- grouping by team
- selecting columns to be averaged (here
Team Points
, not usingVs Points
since every game appears twice, but you could add your other features to the list) - use df.rolling() with a window of 2. The number only appears once so easy to change or use as parameter in a function.
As underlined in another answer this could use the slightly denser notation groupby().rolling()
, thus in this case:
JavaScript
1
2
1
>>> roll = df.groupby('Team', as_index=False)[['Team Points']].rolling(2).mean()
2