Skip to content
Advertisement

Finding an average per item and ID through time (Python)

The question is as follows. Suppose I have a data frame like this:

item event sales
1 A 130
1 B 156
1 C 108
2 B 150
2 D 118

In this data frame, event A is first in time, then B, then C and so forth. I now want an average per item-id combination through time. This means that for item 1 event A, the average is simply 130. For item 1 and event B, the average should be (130+156)/2 = 143. But for item 2, event B, the average is 150 and for item 2 and event D, the average is (130+118)/2 = 124.

So the outcome should look like this:

item event sales
1 A 130
1 B 143
1 C 131.33
2 B 150
2 D 124

Is this possible without a loop? Can we do this with a group by somehow?

Thanks in advance!

Advertisement

Answer

Use Expanding.mean with Series.reset_index for remove first level of MultiIndex for correct align to new column:

df['new'] = df.groupby('item')['sales'].expanding().mean().reset_index(level=0, drop=True)
print (df)
   item event  sales         new
0     1     A    130  130.000000
1     1     B    156  143.000000
2     1     C    108  131.333333
3     2     B    150  150.000000
4     2     D    118  134.000000
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement