After specifying grouping by column a
and restricting to column f
and g
for histogram, I still have column a
showing up in green. Is there a way to remove it without going into matplotlib or for loop?
JavaScript
x
3
1
axes = dfs.hist(column=['f', 'g'], by='a', layout=(1, 3), legend=True, bins=np.linspace(0, 8, 10),
2
sharex=True, sharey=True)
3
Advertisement
Answer
This is clearly a bug with the pandas library. The problem seems to arise when by
is a numeric dtype column — it probably subsets the DataFrame to the labels in column
and by
and then plots that, which is problematic when by
is numeric.
You can either create non-numeric labels for the column that defines your 'by'
, or if you don’t want to change your data, it suffices to re-assign the type to object
just before the plot.
Sample Data
JavaScript
1
7
1
import pandas as pd
2
import numpy as np
3
4
df = pd.DataFrame({'length': np.random.normal(0, 1, 1000),
5
'width': np.random.normal(0, 1, 1000),
6
'a': np.random.randint(0, 2, 1000)})
7
JavaScript
1
3
1
# Problem with a numeric dtype for `by` column
2
df.hist(column=['length', 'width'], by='a', figsize=(4, 2))
3
JavaScript
1
4
1
# Works fine when column type is object
2
(df.assign(a=df['a'].astype('object'))
3
.hist(column=['length', 'width'], by='a' , figsize=(4, 2)))
4