Skip to content
Advertisement

Boxplots in matplotlib: Markers and outliers

I have some questions about boxplots in matplotlib:

Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?

                       enter image description here

Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max and min values?)

Advertisement

Answer

Here’s a graphic that illustrates the components of the box from a stats.stackexchange answer. Note that k=1.5 if you don’t supply the whis keyword in Pandas.

annotated box in a boxplot

The boxplot function in Pandas is a wrapper for matplotlib.pyplot.boxplot. The matplotlib docs explain the components of the boxes in detail:

Question A:

The box extends from the lower to upper quartile values of the data, with a line at the median.

i.e. a quarter of the input data values is below the box, a quarter of the data lies in each part of the box, and the remaining quarter lies above the box.

Question B:

whis : float, sequence, or string (default = 1.5)

As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 – whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points.

Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:

Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string ‘range’ to force the whiskers to the min and max of the data.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement