The graphs.
The first graph is captured from https://towardsdatascience.com/violin-plots-explained-fb1d115e023d. And the second one is created by myself.
I found this requirement when I was doing my python matplotlib assignment since the professor required us to highlight the outliers on violin plot whether they existed.
For the violin plot I created, there is no y values, or in other words, the y values are empty even I use python commander instead of jupyter notebook for when I pause my mouse on the graph, there is only x value, but the y value is “y = ”. There is no way for me if I want to plot circles to highlight the outliers since there are only x values.
Moreover, I created violin plot by using seaborn library.
Is there any solutions?
Advertisement
Answer
It is a bit unclear how exactly you created the violinplot. In general, the non-numeric axes are categorical and numbered internally as 0, 1, 2, ...
. So, y
would be 0
here.
Outliers can be defined in many ways, for a standard boxplot the whiskers are drawn at 1.5 times the distance between the first and third quartile.
Note that a boxplot shows the data “as is”, while a violinplot smooths out the data. Depending on the distribution it could give the impression of data being at places where they would be impossible in practice (e.g. negative values for a height). Which one to prefer in a given situation depends on many factors, but it is important to understand the limitations of each.
Seaborn also has a boxenplot, similar to a boxplot, but showing many more of the quantiles. And there is swarmplot, which draws all the points, but pushes them away to avoid overlap. If there are too many points, you might want to limit the swarmplot to a subset. A swarmplot can also be combined with e.g. a boxplot to show extra information. Or instead of the scatterplot to show outliers.
The following plot compares a default boxplot with a violinplot, showing the outliers in red and a boxenplot:
from matplotlib import pyplot as plt import seaborn as sns import numpy as np np.random.seed(553) data = np.random.randn(6, 500).cumsum(axis=1).ravel() q1, q3 = np.percentile(data, [25, 75]) whisker_low = q1 - (q3 - q1) * 1.5 whisker_high = q3 + (q3 - q1) * 1.5 fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10, 6), sharex=True) sns.boxplot(x=data, color='CornflowerBlue', ax=ax1) sns.violinplot(x=data, color='CornflowerBlue', ax=ax3) outliers = data[(data > whisker_high) | (data < whisker_low)] sns.scatterplot(x=outliers, y=0, marker='D', color='crimson', ax=ax3) sns.boxenplot(x=data, color='CornflowerBlue', ax=ax2) sns.swarmplot(x=data, color='CornflowerBlue', size=1.5, ax=ax4) plt.setp((ax1, ax2, ax3, ax4), "yticks", []) sns.despine(fig, top=True, left=True, right=True) ax1.tick_params(labelbottom=True) ax2.tick_params(labelbottom=True) ax1.set_title('boxplot') ax2.set_title('boxenplot') ax3.set_title('violinplot with outliers') ax4.set_title('swarmplot') plt.tight_layout() plt.show()