Skip to content
Advertisement

Inconsistency when setting figure size using pandas plot method

I’m trying to use the convenience of the plot method of a pandas dataframe while adjusting the size of the figure produced. (I’m saving the figures to file as well as displaying them inline in a Jupyter notebook). I found the method below successful most of the time, except when I plot two lines on the same chart – then the figure goes back to the default size.

I suspect this might be due to the differences between plot on a series and plot on a dataframe.

Setup example code:

data = {
    'A': 90 + np.random.randn(366),
    'B': 85 + np.random.randn(366)
}

date_range = pd.date_range('2016-01-01', '2016-12-31')

index = pd.Index(date_range, name='Date')

df = pd.DataFrame(data=data, index=index)

Control – this code produces the expected result (a wide plot):

fig = plt.figure(figsize=(10,4))

df['A'].plot()
plt.savefig("plot1.png")
plt.show()

Result:

plot1.png

Plotting two lines – figure size is not (10,4)

fig = plt.figure(figsize=(10,4))

df[['A', 'B']].plot()
plt.savefig("plot2.png")
plt.show()

Result:

plot2.png

What’s the right way to do this so that the figure size is consistency set regardless of number of series selected?

Advertisement

Answer

The reason for the difference between the two cases is a bit hidden inside the logic of pandas.DataFrame.plot(). As one can see in the documentation this method allows a lot of arguments to be passed such that it will handle all kinds of different cases.

Here in the first case, you create a matplotlib figure via fig = plt.figure(figsize=(10,4)) and then plot a single column DataFrame. Now the internal logic of pandas plot function is to check if there is already a figure present in the matplotlib state machine, and if so, use it’s current axes to plot the columns values to it. This works as expected.

However in the second case, the data consists of two columns. There are several options how to handle such a plot, including using different subplots with shared or non-shared axes etc. In order for pandas to be able to apply any of those possible requirements, it will by default create a new figure to which it can add the axes to plot to. The new figure will not know about the already existing figure and its size, but rather have the default size, unless you specify the figsize argument.

In the comments, you say that a possible solution is to use df[['A', 'B']].plot(figsize=(10,4)). This is correct, but you then need to omit the creation of your initial figure. Otherwise it will produce 2 figures, which is probably undesired. In a notebook this will not be visible, but if you run this as a usual python script with plt.show() at the end, there will be two figure windows opening.

So the solution which lets pandas take care of figure creation is

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]})
df[['A', 'B']].plot(figsize=(10,4))

plt.show()

A way to circumvent the creation of a new figure is to supply the ax argument to the pandas.DataFrame.plot(ax=ax) function, where ax is an externally created axes. This axes can be the standard axes you obtain via plt.gca().

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]})
plt.figure(figsize=(10,4))
df[['A', 'B']].plot(ax = plt.gca())

plt.show()

Alternatively use the more object oriented way seen in the answer from PaulH.

Advertisement