How to have clusters of stacked bars

Question

So here is how my data set looks like : I want to have stacked bar plot for each dataframe but since they have same index, I'd like to have 2 stacked bars per index. I've tried to plot both on the same axes : But it overlaps. Then I tried to concat the two dataset first : but here

Accepted Answer

I eventually found a trick (edit: see below for using seaborn and longform dataframe):Solution with pandas and matplotlibHere it is with a more complete example :import pandas as pdimport matplotlib.cm as cmimport numpy as npimport matplotlib.pyplot as pltdef plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):    """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot. labels is a list of the names of the dataframe, used for the legendtitle is a string for the title of the plotH is the hatch used for identification of the different dataframe"""    n_df = len(dfall)    n_col = len(dfall[0].columns)     n_ind = len(dfall[0].index)    axe = plt.subplot(111)    for df in dfall : # for each data frame        axe = df.plot(kind="bar",                      linewidth=0,                      stacked=True,                      ax=axe,                      legend=False,                      grid=False,                      **kwargs)  # make bar plots    h,l = axe.get_legend_handles_labels() # get the handles we want to modify    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df        for j, pa in enumerate(h[i:i+n_col]):            for rect in pa.patches: # for each index                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))                rect.set_hatch(H * int(i / n_col)) #edited part                     rect.set_width(1 / float(n_df + 1))    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)    axe.set_xticklabels(df.index, rotation = 0)    axe.set_title(title)    # Add invisible data to add another legend    n=[]            for i in range(n_df):        n.append(axe.bar(0, 0, color="gray", hatch=H * i))    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])    if labels is not None:        l2 = plt.legend(n, labels, loc=[1.01, 0.1])     axe.add_artist(l1)    return axe# create fake dataframesdf1 = pd.DataFrame(np.random.rand(4, 5),                   index=["A", "B", "C", "D"],                   columns=["I", "J", "K", "L", "M"])df2 = pd.DataFrame(np.random.rand(4, 5),                   index=["A", "B", "C", "D"],                   columns=["I", "J", "K", "L", "M"])df3 = pd.DataFrame(np.random.rand(4, 5),                   index=["A", "B", "C", "D"],                    columns=["I", "J", "K", "L", "M"])# Then, just call :plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])    And it gives that :You can change the colors of the bar by passing a cmap argument:plot_clustered_stacked([df1, df2, df3],                       ["df1", "df2", "df3"],                       cmap=plt.cm.viridis)Solution with seaborn:Given the same df1, df2, df3, below, I convert them in a long form:df1["Name"] = "df1"df2["Name"] = "df2"df3["Name"] = "df3"dfall = pd.concat([pd.melt(i.reset_index(),                           id_vars=["Name", "index"]) # transform in tidy format each df                   for i in [df1, df2, df3]],                   ignore_index=True)The problem with seaborn is that it doesn&#8217;t stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other:dfall.set_index(["Name", "index", "variable"], inplace=1)dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum()dfall.reset_index(inplace=True) >>> dfall.head(6)  Name index variable     value       vcs0  df1     A        I  0.717286  0.7172861  df1     B        I  0.236867  0.2368672  df1     C        I  0.952557  0.9525573  df1     D        I  0.487995  0.4879954  df1     A        J  0.174489  0.8917755  df1     B        J  0.332001  0.568868Then loop over each group of variable and plot the cumulative sum:c = ["blue", "purple", "red", "green", "pink"]for i, g in enumerate(dfall.groupby("variable")):    ax = sns.barplot(data=g[1],                     x="index",                     y="vcs",                     hue="Name",                     color=c[i],                     zorder=-i, # so first bars stay on top                     edgecolor="k")ax.legend_.remove() # remove the redundant legends It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it&#8217;s a bit too light for the first one, and I don&#8217;t really know how to change that without changing each rectangle one by one (as in the first solution).Tell me if you don&#8217;t understand something in the code.Feel free to re-use this code which is under CC0.

Advertisement

Answer

Solution with pandas and matplotlib

Solution with seaborn: