I have a dataframe with more than 500 cities which look like this city value datetime london 23 2022-03-25 17:59:18 dubai 12 2022-03-25 17:59:36 berlin 5 2022-03-25 17:59:42 london 25 2022-03-25 18:01:18 dubai 12 2022-03-25 18:02:18 berlin 5 2022-03-25 18:03:18 I have a function called rolling_mean which creates a new column 'rolling_mean' which calculates the last hour rolling average. However

Apply function to each unique value of column seperately

I have a dataframe with more than 500 cities which look like this

city	value	datetime
london	23	2022-03-25 17:59:18
dubai	12	2022-03-25 17:59:36
berlin	5	2022-03-25 17:59:42
london	25	2022-03-25 18:01:18
dubai	12	2022-03-25 18:02:18
berlin	5	2022-03-25 18:03:18

I have a function called rolling_mean which creates a new column ‘rolling_mean’ which calculates the last hour rolling average.

def rolling_mean(df):
    df['rolling_mean'] = (df.set_axis(datetime)
                        .rolling('1h')['value']
                        .mean()
                        .set_axis(df.index)
                      )

However I would like to apply this function to each city separately so that when the new rolling_mean column is created, the rolling average don’t conflict with different cities. Since there are almost 500 cities in the dataframe. I am not sure how to do this.

Answer

You can do it with groupby methods

df.groupby('city').apply(rolling_mean)

Advertisement

Answer