The question is as follows. Suppose I have a data frame like this:
item | event | sales |
---|---|---|
1 | A | 130 |
1 | B | 156 |
1 | C | 108 |
2 | B | 150 |
2 | D | 118 |
… | … | … |
In this data frame, event A
is first in time, then B, then C and so forth.
I now want an average per item-id combination through time.
This means that for item 1 event A, the average is simply 130. For item 1 and event B, the average should be (130+156)/2 = 143. But for item 2, event B, the average is 150 and for item 2 and event D, the average is (130+118)/2 = 124.
So the outcome should look like this:
item | event | sales |
---|---|---|
1 | A | 130 |
1 | B | 143 |
1 | C | 131.33 |
2 | B | 150 |
2 | D | 124 |
… | … | … |
Is this possible without a loop? Can we do this with a group by somehow?
Thanks in advance!
Advertisement
Answer
Use Expanding.mean
with Series.reset_index
for remove first level of MultiIndex
for correct align to new column:
df['new'] = df.groupby('item')['sales'].expanding().mean().reset_index(level=0, drop=True) print (df) item event sales new 0 1 A 130 130.000000 1 1 B 156 143.000000 2 1 C 108 131.333333 3 2 B 150 150.000000 4 2 D 118 134.000000