Skip to content
Advertisement

how to remove empty space from bars for a specific group, that was plotted as seaborn bar plot on data in pandas dataframe

I have a dataset which looks like this:

import pandas as pd, seaborn as sns, matplotlib.pyplot as plt, numpy as np

data = {"country":  ["USA", "USA",  "USA",  "GBR",  "GBR",  "GBR",  "IND",  "IND",  "IND"],
"sector":   ["Others", "Sec1", "Sec2",  "Others",   "Sec2", "Sec1", "Others",   "Sec1", "Sec3"],
"counts":   [8763,  8121,   7822,   580,    481,    460,    332,    193,    154]}

df = pd.DataFrame.from_dict(data)

df['counts_log'] = df['counts'].apply(lambda x: np.log10(x))

When I am plotting this data using the following code:

plt.figure(figsize=(18, 6))
sns.barplot(x='country', y='counts_log', hue='sector', data=df, palette='tab10')
plt.legend([],[], frameon=False)
plt.show()

I get the following issue (there is always some space between the bars of IND):

Bar plot of the data

Whatever I had tried, it is not going away. How to fix the issue?

Advertisement

Answer

This happens because you’ve got missing values in your DataFrame.

You can clearly see them pivoting the df

pivot = df.pivot(index=['country'], columns=['sector'], values='counts_log')
print(pivot)

that gives

sector     Others      Sec1      Sec2      Sec3
country                                        
GBR      2.763428  2.662758  2.682145       NaN
IND      2.521138  2.285557       NaN  2.187521
USA      3.942653  3.909610  3.893318       NaN

So, there is “space” in IND Sec2 because you have no data. Same for GBR Sec3 and USA Sec3.

The only workaround I can suggest is to plot in subplots like

color_map = {
    'Others': 'C0',
    'Sec1': 'C1',
    'Sec2': 'C2',
    'Sec3': 'C3',
}
df['color'] = df.sector.map(color_map)

fig, ax = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
for i, country in enumerate(df.country.unique()):
    _df = df[df.country==country].sort_values(by='sector')
    sns.barplot(
        ax=ax[i],
        data=_df,
        x='sector', y='counts_log',
        palette=_df.color
    )
    ax[i].set(
        title=country
    )

enter image description here

Maybe this is not exactly what you were searching for but hope it can help.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement