The question is as follows. Suppose I have a data frame like this:
| item | event | sales |
|---|---|---|
| 1 | A | 130 |
| 1 | B | 156 |
| 1 | C | 108 |
| 2 | B | 150 |
| 2 | D | 118 |
| … | … | … |
In this data frame, event A is first in time, then B, then C and so forth.
I now want an average per item-id combination through time.
This means that for item 1 event A, the average is simply 130. For item 1 and event B, the average should be (130+156)/2 = 143. But for item 2, event B, the average is 150 and for item 2 and event D, the average is (130+118)/2 = 124.
So the outcome should look like this:
| item | event | sales |
|---|---|---|
| 1 | A | 130 |
| 1 | B | 143 |
| 1 | C | 131.33 |
| 2 | B | 150 |
| 2 | D | 124 |
| … | … | … |
Is this possible without a loop? Can we do this with a group by somehow?
Thanks in advance!
Advertisement
Answer
Use Expanding.mean with Series.reset_index for remove first level of MultiIndex for correct align to new column:
df['new'] = df.groupby('item')['sales'].expanding().mean().reset_index(level=0, drop=True)
print (df)
item event sales new
0 1 A 130 130.000000
1 1 B 156 143.000000
2 1 C 108 131.333333
3 2 B 150 150.000000
4 2 D 118 134.000000