How to get business weekly averages per method in pandas

Question

I have the following problem. Suppose I have a wide data Frame consisting of three columns (mock example follows below). Essentially, it consists of three factors, A, B and C for which I have certain values for each business day within a time range. I would like to plot the business weekly averages of the the…

Accepted Answer

Try as follows:res = df.groupby([pd.Grouper(key='x', freq='W-FRI'),df.factor])['y'].mean()    .reset_index(drop=False)res = res.rename(columns={'x':'time','factor':'factors','y':'values'})res['time'] = res.time.map(pd.merge_asof(df.x, res.time, left_on='x',                                          right_on='time', direction='forward')                           .groupby('time').last()['x']).astype(str)print(res)          time factors    values0   2022-10-07       A  0.1712281   2022-10-07       B -0.2504322   2022-10-07       C -0.1269603   2022-10-14       A  0.4559724   2022-10-14       B  0.5829005   2022-10-14       C  0.1046526   2022-10-21       A -0.5262217   2022-10-21       B  0.3710078   2022-10-21       C  0.0120999   2022-10-27       A -0.12351010  2022-10-27       B -0.56644111  2022-10-27       C -0.652455Plot data:import seaborn as snsimport matplotlib.pyplot as pltsns.set_theme()fig, ax = plt.subplots(figsize=(8,5))ax = sns.lineplot(data=res, x='time', y='values', hue='factors')sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))plt.show()Result:ExplanationFirst, apply df.groupby. Grouping by factor is of course easy; for the dates we can use pd.Grouper with freq parameter set to W-FRI (each week through to Friday), and then we want to get the mean for column y (NaN values will just be ignored).In the next step, let&#8217;s use df.rename to rename the columns.We are basically done now, except for the fact that pd.Grouper will use each Friday (even if it isn&#8217;t present in the actual set). E.g.:print(res.time.unique())['2022-10-07T00:00:00.000000000' '2022-10-14T00:00:00.000000000' '2022-10-21T00:00:00.000000000' '2022-10-28T00:00:00.000000000']If you are OK with this, you can just start plotting (but see below). If you would like to get '2022-10-27' instead of '2022-10-28', we can combine Series.map applied to column time with pd.merge_asof,and perform another groupby to get last in column x. I.e. this will get us the closest match to each Friday within each week (so, in fact just Friday in all cases, except the last: 2022-10-17).In either scenario, before plotting, make sure to turn the datetime values into strings: res['time'] = res['time'].astype(str)!

Advertisement

Answer