Skip to content
Advertisement

Is there a faster method to do a Pandas groupby cumulative mean?

I am trying to create a lookup reference table in Python that calculates the cumulative mean of a Player’s previous (by datetime) games scores, grouped by venue. However, for my specific need, a player should have previously played a minimum of 2 times at the relevant Venue for a 'Venue Preference' cumulative mean calculation.

df format looks like the following:

DateTime Player Venue Score
2021-09-25 17:15:00 Tim Stadium A 20
2021-09-27 10:00:00 Blake Stadium B 30

My existing code that works perfectly, but unfortunately is very slow, is as follows:

JavaScript

I am sure there is a way to calculate the cumulative mean in one step without first calculating the cumulative sum and cumulative count, but unfortunately I couldn’t get that to work.

Advertisement

Answer

IIUC remove 2 groupby by aggregate by sum and size first and then cumulative sum by both columns:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement