I have a dataset which looks like this:
JavaScript
x
10
10
1
import pandas as pd, seaborn as sns, matplotlib.pyplot as plt, numpy as np
2
3
data = {"country": ["USA", "USA", "USA", "GBR", "GBR", "GBR", "IND", "IND", "IND"],
4
"sector": ["Others", "Sec1", "Sec2", "Others", "Sec2", "Sec1", "Others", "Sec1", "Sec3"],
5
"counts": [8763, 8121, 7822, 580, 481, 460, 332, 193, 154]}
6
7
df = pd.DataFrame.from_dict(data)
8
9
df['counts_log'] = df['counts'].apply(lambda x: np.log10(x))
10
When I am plotting this data using the following code:
JavaScript
1
5
1
plt.figure(figsize=(18, 6))
2
sns.barplot(x='country', y='counts_log', hue='sector', data=df, palette='tab10')
3
plt.legend([],[], frameon=False)
4
plt.show()
5
I get the following issue (there is always some space between the bars of IND):
Whatever I had tried, it is not going away. How to fix the issue?
Advertisement
Answer
This happens because you’ve got missing values in your DataFrame.
You can clearly see them pivoting the df
JavaScript
1
3
1
pivot = df.pivot(index=['country'], columns=['sector'], values='counts_log')
2
print(pivot)
3
that gives
JavaScript
1
6
1
sector Others Sec1 Sec2 Sec3
2
country
3
GBR 2.763428 2.662758 2.682145 NaN
4
IND 2.521138 2.285557 NaN 2.187521
5
USA 3.942653 3.909610 3.893318 NaN
6
So, there is “space” in IND
Sec2
because you have no data. Same for GBR
Sec3
and USA
Sec3
.
The only workaround I can suggest is to plot in subplots like
JavaScript
1
21
21
1
color_map = {
2
'Others': 'C0',
3
'Sec1': 'C1',
4
'Sec2': 'C2',
5
'Sec3': 'C3',
6
}
7
df['color'] = df.sector.map(color_map)
8
9
fig, ax = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
10
for i, country in enumerate(df.country.unique()):
11
_df = df[df.country==country].sort_values(by='sector')
12
sns.barplot(
13
ax=ax[i],
14
data=_df,
15
x='sector', y='counts_log',
16
palette=_df.color
17
)
18
ax[i].set(
19
title=country
20
)
21
Maybe this is not exactly what you were searching for but hope it can help.