I’m trying to use the convenience of the plot method of a pandas dataframe while adjusting the size of the figure produced. (I’m saving the figures to file as well as displaying them inline in a Jupyter notebook). I found the method below successful most of the time, except when I plot two lines on the same chart – then the figure goes back to the default size.
I suspect this might be due to the differences between plot on a series and plot on a dataframe.
Setup example code:
data = { 'A': 90 + np.random.randn(366), 'B': 85 + np.random.randn(366) } date_range = pd.date_range('2016-01-01', '2016-12-31') index = pd.Index(date_range, name='Date') df = pd.DataFrame(data=data, index=index)
Control – this code produces the expected result (a wide plot):
fig = plt.figure(figsize=(10,4)) df['A'].plot() plt.savefig("plot1.png") plt.show()
Result:
Plotting two lines – figure size is not (10,4)
fig = plt.figure(figsize=(10,4)) df[['A', 'B']].plot() plt.savefig("plot2.png") plt.show()
Result:
What’s the right way to do this so that the figure size is consistency set regardless of number of series selected?
Advertisement
Answer
The reason for the difference between the two cases is a bit hidden inside the logic of pandas.DataFrame.plot()
. As one can see in the documentation this method allows a lot of arguments to be passed such that it will handle all kinds of different cases.
Here in the first case, you create a matplotlib figure via fig = plt.figure(figsize=(10,4))
and then plot a single column DataFrame. Now the internal logic of pandas plot function is to check if there is already a figure present in the matplotlib state machine, and if so, use it’s current axes to plot the columns values to it. This works as expected.
However in the second case, the data consists of two columns. There are several options how to handle such a plot, including using different subplots with shared or non-shared axes etc. In order for pandas to be able to apply any of those possible requirements, it will by default create a new figure to which it can add the axes to plot to. The new figure will not know about the already existing figure and its size, but rather have the default size, unless you specify the figsize
argument.
In the comments, you say that a possible solution is to use df[['A', 'B']].plot(figsize=(10,4))
. This is correct, but you then need to omit the creation of your initial figure. Otherwise it will produce 2 figures, which is probably undesired. In a notebook this will not be visible, but if you run this as a usual python script with plt.show()
at the end, there will be two figure windows opening.
So the solution which lets pandas take care of figure creation is
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]}) df[['A', 'B']].plot(figsize=(10,4)) plt.show()
A way to circumvent the creation of a new figure is to supply the ax
argument to the pandas.DataFrame.plot(ax=ax)
function, where ax
is an externally created axes. This axes can be the standard axes you obtain via plt.gca()
.
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]}) plt.figure(figsize=(10,4)) df[['A', 'B']].plot(ax = plt.gca()) plt.show()
Alternatively use the more object oriented way seen in the answer from PaulH.