Ordering a stacked histplot based on total counts

Question

I have a dataframe which results from: Then, df_grouped is something like: A B count A_1 B_1 10 A_1 B_2 51 A_1 B_3 25 A_1 B_4 12 A_1 B_5 2 A_2 B_1 19 A_2 B_3 5 A_3 B_5 18 A_3 B_4 33 A_3 B_5 44 A_4 B_1 29 A_5 B_2 32 I have plotted a seaborn.histplot using the following code:

Accepted Answer

To be clear, the visualization is a stacked bar chart, it&#8217;s not a histogram, as a histrogram represents the distribution of continuous values, while this is the counts of discrete categorical values.This answer starts with the raw dataframe, not the dataframe created with .groupby.The easiest way to do this is create a frequency table of the raw dataframe using pd.crosstab, not with .groupby.Add a column with the sum along axis=1.Use the new column to sort the dataframe.Plot directly with pandas.DataFrame.plot using kind='bar' and stacked=True.seaborn.histplot is not needed, and seaborn is just a high-level api for matplotlibpandas uses matplotlib by default for plotting.This reduces the code to 4 lines.Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2import numpy as np  # used for creating sample dataimport pandas as pd# sample dataframe representing raw datanp.random.seed(365)rows = 1100data = {'A': np.random.choice([f'A_{v}' for v in range(1, 6)], size=rows, p=[.35, .05, .25, .15, .2]),        'B': np.random.choice([f'B_{v}' for v in range(1, 6)], size=rows, p=[.2, .35, .05, .15, .25])}df = pd.DataFrame(data)# 1. frequency countsdfc = pd.crosstab(df.A, df.B)# 2. add total columndfc['tot_A'] = dfc.sum(axis=1)# 3. sortdfc = dfc.sort_values('tot_A', axis=0, ascending=False)# 4. plot the columns except `tot_A`dfc.iloc[:, :-1].plot(kind='bar', stacked=True, figsize=(10, 5), rot=0, width=1, ec='k')Data Viewsdf     A    B0  A_5  B_51  A_3  B_12  A_4  B_53  A_3  B_44  A_3  B_5dfcB    B_1  B_2  B_3  B_4  B_5  tot_AA                                  A_1   86  131   15   55   90    377A_3   47   90    9   33   61    240A_5   37   83   13   33   56    222A_4   43   65    9   27   50    194A_2   16   21    1    5   24     67

Ordering a stacked histplot based on total counts

Edit

Advertisement

Answer

Data Views

`df`

`dfc`

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32