How to create grouped and stacked bars

Question

I have a very huge dataset with a lot of subsidiaries serving three customer groups in various countries, something like this (in reality there are much more subsidiaries and dates): I&#8217;d like to make an analysis per subsidiary by producing a stacked bar chart. To do this, I started by defining the x-axi…

Accepted Answer

As an FYI, stacked bars are not the best option, because they can make it difficult to compare bar values and can easily be misinterpreted. The purpose of a visualization is to present data in an easily understood format; make sure the message is clear. Side-by-side bars are often a better option.Side-by-side stacked bars are a difficult manual process to construct, it&#8217;s better to use a figure-level method like seaborn.catplot, which will create a single, easy to read, data visualization.Bar plot ticks are located by 0 indexed range (not datetimes), the dates are just labels, so it is not necessary to convert them to a datetime dtype.Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2seabornimport seaborn as snssns.catplot(kind='bar', data=df, col='subsidiary', x='date', y='value', hue='business')Create grouped and stacked barsSee Stacked Bar Chart and Grouped bar chart with labelsThe issue with the creation of the stacked bars in the OP is bottom is being set on the entire dataframe for that group, instead of only the values that make up the bar height.do I really need to create three sub-dfs per subsidiary. Yes, a DataFrame is needed for every group, so 6, in this case.Creating the data subsets can be automated using a dict-comprehension to unpack the .groupby object into a dict.data = {''.join(k): v for k, v in df.groupby(['subsidiary', 'business'])} to create a dict of DataFramesAccess the values like: data['EUCORP'].valueAutomating the plot creation is more arduous, as can be seen x depends on how many groups of bars for each tick, and bottom depends on the values for each subsequent plot.import numpy as npimport matplotlib.pyplot as pltlabels=df['date'].drop_duplicates()  # set the dates as labelsx0 = np.arange(len(labels))  # create an array of values for the ticks that can perform arithmetic with width (w)# create the data groups with a dict comprehension and groupbydata = {''.join(k): v for k, v in df.groupby(['subsidiary', 'business'])}# build the plotssubs = df.subsidiary.unique()stacks = len(subs)  # how many stacks in each group for a tick locationbusiness = df.business.unique()# set the widthw = 0.35# this needs to be adjusted based on the number of stacks; each location needs to be split into the proper number of locationsx1 = [x0 - w/stacks, x0 + w/stacks]fig, ax = plt.subplots()for x, sub in zip(x1, subs):    bottom = 0    for bus in business:        height = data[f'{sub}{bus}'].value.to_numpy()        ax.bar(x=x, height=height, width=w, bottom=bottom)        bottom += height        ax.set_xticks(x0)_ = ax.set_xticklabels(labels)As you can see, small values are difficult to discern, and using ax.set_yscale('log') does not work as expected with stacked bars (e.g. it does not make small values more readable).Create only stacked barsAs mentioned by @r-beginners, use .pivot, or .pivot_table, to reshape the dataframe to a wide form to create stacked bars where the x-axis is a tuple ('date', 'subsidiary').Use .pivot if there are no repeat values for each categoryUse .pivot_table, if there are repeat values that must be combined with aggfunc (e.g. 'sum', 'mean', etc.)# reshape the dataframedfp = df.pivot(index=['date', 'subsidiary'], columns=['business'], values='value')# plot stacked barsdfp.plot(kind='bar', stacked=True, rot=0, figsize=(10, 4))

How to create grouped and stacked bars

Advertisement

Answer

`seaborn`

Create grouped and stacked bars

Create only stacked bars