Remove by column in pandas.DataFrame.hist

Question

After specifying grouping by column a and restricting to column f and g for histogram, I still have column a showing up in green. Is there a way to remove it without going into matplotlib or for loop? Answer This is clearly a bug with the pandas library. The problem seems to arise when by is a numeric dtype c…

Accepted Answer

This is clearly a bug with the pandas library. The problem seems to arise when by is a numeric dtype column &#8212; it probably subsets the DataFrame to the labels in column and by and then plots that, which is problematic when by is numeric.You can either create non-numeric labels for the column that defines your 'by', or if you don&#8217;t want to change your data, it suffices to re-assign the type to object just before the plot.Sample Dataimport pandas as pdimport numpy as npdf = pd.DataFrame({'length': np.random.normal(0, 1, 1000),                   'width': np.random.normal(0, 1, 1000),                   'a': np.random.randint(0, 2, 1000)})# Problem with a numeric dtype for `by` columndf.hist(column=['length', 'width'], by='a', figsize=(4, 2))# Works fine when column type is object(df.assign(a=df['a'].astype('object'))   .hist(column=['length', 'width'], by='a' , figsize=(4, 2)))

Advertisement

Answer

Sample Data