I have a dataset which looks like this:
import pandas as pd, seaborn as sns, matplotlib.pyplot as plt, numpy as np data = {"country": ["USA", "USA", "USA", "GBR", "GBR", "GBR", "IND", "IND", "IND"], "sector": ["Others", "Sec1", "Sec2", "Others", "Sec2", "Sec1", "Others", "Sec1", "Sec3"], "counts": [8763, 8121, 7822, 580, 481, 460, 332, 193, 154]} df = pd.DataFrame.from_dict(data) df['counts_log'] = df['counts'].apply(lambda x: np.log10(x))
When I am plotting this data using the following code:
plt.figure(figsize=(18, 6)) sns.barplot(x='country', y='counts_log', hue='sector', data=df, palette='tab10') plt.legend([],[], frameon=False) plt.show()
I get the following issue (there is always some space between the bars of IND):
Whatever I had tried, it is not going away. How to fix the issue?
Advertisement
Answer
This happens because you’ve got missing values in your DataFrame.
You can clearly see them pivoting the df
pivot = df.pivot(index=['country'], columns=['sector'], values='counts_log') print(pivot)
that gives
sector Others Sec1 Sec2 Sec3 country GBR 2.763428 2.662758 2.682145 NaN IND 2.521138 2.285557 NaN 2.187521 USA 3.942653 3.909610 3.893318 NaN
So, there is “space” in IND
Sec2
because you have no data. Same for GBR
Sec3
and USA
Sec3
.
The only workaround I can suggest is to plot in subplots like
color_map = { 'Others': 'C0', 'Sec1': 'C1', 'Sec2': 'C2', 'Sec3': 'C3', } df['color'] = df.sector.map(color_map) fig, ax = plt.subplots(1, 3, figsize=(15, 5), sharey=True) for i, country in enumerate(df.country.unique()): _df = df[df.country==country].sort_values(by='sector') sns.barplot( ax=ax[i], data=_df, x='sector', y='counts_log', palette=_df.color ) ax[i].set( title=country )
Maybe this is not exactly what you were searching for but hope it can help.