I want to connect box plot means. I can do the basic part but cannot connect box plot means and box plots offset from x axis. similar post but not connecting means Python: seaborn pointplot and boxplot in one plot but shifted on the x-axis
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51] } data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score']) first_name pre_score 0 Jason 4 1 Molly 24 2 Tina 31 3 Jake 2 4 Amy 3 5 Jason 25 6 Molly 94 7 Tina 57 8 Jake 62 9 Amy 70 10 Jason 5 11 Molly 43 12 Tina 23 13 Jake 23 14 Amy 51 sns.set_style("ticks") ax = sns.stripplot(x='first_name', y='pre_score', hue='first_name', jitter=True, dodge=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data) ax = sns.boxplot(x='first_name', y='pre_score', hue='first_name', dodge=True, showfliers=True, linewidth=0.8, showmeans=True, data=data) ax = sns.lineplot(x='first_name', y='pre_score', color='k', data=data.groupby(['first_name'], as_index=False).mean()) fig_size = [18.0, 10.0] plt.rcParams["figure.figsize"] = fig_size handles, labels = ax.get_legend_handles_labels() legend_len = labels.__len__() ax.legend(handles[int(legend_len/2):legend_len], labels[int(legend_len/2):legend_len], bbox_to_anchor=(1.01, 1), loc=2, borderaxespad=0.1);
As we can see the sns.line plot does not follow the means and box plots and names in the x axis has offset.
How can I fix this ?
Advertisement
Answer
When dealing with seaborn plot, I would strongly recommend you always provide an order=
(and hue_order=
if applicable) to avoid nasty surprise with the categories not showing up in a consistent order between calls.
For the purpose of your question, you can replace the lineplot
with a pointplot
, which will automatically aggregate the values by categories and plot using a line
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51] } data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score']) # define the order in which the categories will be plotted on the x-axis order = np.sort(data['first_name'].unique()) # you could also create a list by hand if you want a specific order sns.set_style("ticks") ax = sns.stripplot(x='first_name', y='pre_score', order=order, jitter=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data) ax = sns.boxplot(x='first_name', y='pre_score', order=order, showfliers=True, linewidth=0.8, showmeans=True, data=data) ax = sns.pointplot(x='first_name', y='pre_score', order=order, data=data, ci=None, color='black')
If for some reason you don’t want to or cannot use a seaborn function that takes an order
argument, then aggregate by hand in pandas, and reindex()
with your order to make sure the values appear in the right order in the dataframe before plotting with the tool of your choice.
For instance, you could replace the call to pointplot()
above with:
means = data.groupby('first_name')['pre_score'].mean().reindex(order) # calculate the means and ensure they are # displayed in the same order as the boxplots ax.plot(means.index, means.values, 'ko-', lw=3)
and have the exact same result