Skip to content
Advertisement

Pandas lagged rolling average on aggregate data with multiple groups and missing dates

I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows:

JavaScript

This results in the following DataFrame:

JavaScript

Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum of all the fruit counts for the past 7 days, not including the current day. While there are many ways to brute force this answer, I’m looking for something with relatively good time complexity. If I were to calculate this column by hand, these would be the calculations and expected results:

JavaScript

Final DF:

JavaScript

EDIT

The final column I’d like to add is wgt_per_apl_prev_7d, which only considers the apple weights when calculating this field, but still applies to all rows, even rows with just oranges. The output of this calculation should be as follows:

JavaScript

Advertisement

Answer

Try this

JavaScript

Output

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement