I am working with a dataframe containing data of 1 week.
y ds 2017-08-31 10:15:00 1.000000 2017-08-31 10:20:00 1.049107 2017-08-31 10:25:00 1.098214 ... 2017-09-07 10:05:00 99.901786 2017-09-07 10:10:00 99.950893 2017-09-07 10:15:00 100.000000
I create a new index by combining the weekday and time i.e.
y dayIndex 4 - 10:15 1.000000 4 - 10:20 1.049107 4 - 10:25 1.098214 ... 4 - 10:05 99.901786 4 - 10:10 99.950893 4 - 10:15 100.000000
The plot of this data is the following: The plot is correct as the labels reflect the data in the dataframe. However, when zooming in, the labels do not seem correct as they no longer correspond to their original values: What is causing this behavior?
Here is the code to reproduce this:
import datetime import numpy as np import pandas as pd dtnow = datetime.datetime.now() dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='5T') data = np.linspace(1,100, num=len(dindex)) df = pd.DataFrame({'ds': dindex, 'y': data}) df = df.set_index('ds') df = df.resample('5T').mean() df['dayIndex'] = df.index.strftime('%w - %H:%M') df= df.set_index('dayIndex') df.plot()
Advertisement
Answer
“What is causing this behavior?”
The formatter of an axes of a pandas dates plot is a matplotlib.ticker.FixedFormatter
(see e.g.
print plt.gca().xaxis.get_major_formatter()
). “Fixed” means that it formats the i
th tick (if shown) with some constant string.
When zooming or panning, you shift the tick locations, but not the format strings.
In short: A pandas date plot may not be the best choice for interactive plots.
Solution
A solution is usually to use matplotlib formatters directly. This requires the dates to be datetime
objects (which can be ensured using df.index.to_pydatetime()
).
import datetime import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates dtnow = datetime.datetime.now() dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='110T') data = np.linspace(1,100, num=len(dindex)) df = pd.DataFrame({'ds': dindex, 'y': data}) df = df.set_index('ds') df.index.to_pydatetime() df.plot(marker="o") plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%w - %H:%M')) plt.show()