I have a dataframe with more than 500 cities which look like this
city | value | datetime |
---|---|---|
london | 23 | 2022-03-25 17:59:18 |
dubai | 12 | 2022-03-25 17:59:36 |
berlin | 5 | 2022-03-25 17:59:42 |
london | 25 | 2022-03-25 18:01:18 |
dubai | 12 | 2022-03-25 18:02:18 |
berlin | 5 | 2022-03-25 18:03:18 |
I have a function called rolling_mean which creates a new column ‘rolling_mean’ which calculates the last hour rolling average.
def rolling_mean(df): df['rolling_mean'] = (df.set_axis(datetime) .rolling('1h')['value'] .mean() .set_axis(df.index) )
However I would like to apply this function to each city separately so that when the new rolling_mean column is created, the rolling average don’t conflict with different cities. Since there are almost 500 cities in the dataframe. I am not sure how to do this.
Advertisement
Answer
You can do it with groupby methods
df.groupby('city').apply(rolling_mean)