Skip to content
Advertisement

Ordering a stacked histplot based on total counts

I have a dataframe which results from:

JavaScript

Then, df_grouped is something like:

A B count
A_1 B_1 10
A_1 B_2 51
A_1 B_3 25
A_1 B_4 12
A_1 B_5 2
A_2 B_1 19
A_2 B_3 5
A_3 B_5 18
A_3 B_4 33
A_3 B_5 44
A_4 B_1 29
A_5 B_2 32

I have plotted a seaborn.histplot using the following code:

JavaScript

and results in the following image:

enter image description here

What I would like is to order the plot based on the total counts of each value of A. I have tried different methods, but I am not able to get a successful result.

Edit

I found a way to do what I wanted.

What I did, is to calculate the total counts by df['A'] values:

JavaScript

Then, by using the same plot code from above, I got the desired result.

The answer is similar to what Redox proposed.

In any case, I will try the other options proposed.

Advertisement

Answer

  • To be clear, the visualization is a stacked bar chart, it’s not a histogram, as a histrogram represents the distribution of continuous values, while this is the counts of discrete categorical values.
  • This answer starts with the raw dataframe, not the dataframe created with .groupby.
  1. The easiest way to do this is create a frequency table of the raw dataframe using pd.crosstab, not with .groupby.
  2. Add a column with the sum along axis=1.
  3. Use the new column to sort the dataframe.
  4. Plot directly with pandas.DataFrame.plot using kind='bar' and stacked=True.
    • seaborn.histplot is not needed, and seaborn is just a high-level api for matplotlib
    • pandas uses matplotlib by default for plotting.
  • This reduces the code to 4 lines.
  • Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
JavaScript

enter image description here

Data Views

df

JavaScript

dfc

JavaScript
Advertisement