I have a pandas data-frame of the form
date N 0 2022-06-14 15:00:00 54 1 2022-06-14 15:03:00 55 2 2022-06-14 15:09:00 56 3 2022-06-14 15:13:00 54 4 2022-06-14 15:19:00 56 ... ... ... 2793 2022-07-03 16:09:00 1266 2794 2022-07-03 16:14:00 1249 2795 2022-07-03 16:19:00 1244 2796 2022-07-03 16:25:00 1241 2797 2022-07-03 16:29:00 1250
and I want to plot the last 7 days together with the average over the weekdays.
I can create / plot the average by using
# Round to 10 min and create weekday avg's
avg_week = df.groupby(
[
df["date"].dt.weekday,
10 * round( (df["date"].dt.minute + 60*df["date"].dt.hour) / 10, 0 )
] ).mean()
avg_week.plot(legend=True, ax=ax, color='blue')
N
date date
0 390.0 8.666667
400.0 12.250000
410.0 23.000000
420.0 20.666667
430.0 29.333333
... ... ...
6 1210.0 2.000000
1220.0 3.000000
1230.0 2.000000
1250.0 1.000000
1260.0 0.000000
and I can create / plot the last 7 days by using
# Plot last 7 days (actually 6 days + minutes since midnight to start at a fresh day) df[df['date'] > df["date"].max() - pd.Timedelta(days=6, minutes=60*datetime.now().hour + datetime.now().minute) ].plot(x='date', y='N', ax=ax)
but I fail to combine them to a single plot since the average uses weekday and minutes after midnight (actually idx of avg_week) instead of a datetime obj.
What else I worked out so far is setting the weekdays manually
ax.set_xticklabels( ax.get_xticklabels(),
rotation=45,
ha='right',
rotation_mode='anchor')
ax.set_xticks(ticks=[0] + [ sum( [ avg_week.N[i].size for i in range(n+1) ] ) for n in range(7) ][:-1],
labels = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
but the problem of course persists.
Any ideas? Thanks a lot!
Advertisement
Answer
Consider assigning the weekday average as a separate new column with groupby.transform. Then plot both columns on same x-axis in a time series plot.
# ASSIGN COLUMNS
df = (
df.assign(
weekday = lambda x: x["date"].dt.weekday,
ten_minute_group = lambda x: x["date"].dt.floor('10min'),
weekday_average = lambda x: x.groupby(["weekday", "ten_minute_group"]).transform("mean")
)
)
# FILTER DATA
sub = df[df['date'] > df["date"].max() - pd.Timedelta(
days=6, minutes=60*datetime.now().hour + datetime.now().minute
)
# PLOT SUBSET OF COLUMNS
(
sub.set_index("date")
.reindex(["N", "weekday_average"], axis="columns")
.plot(kind="line", ax=ax)
)


