Skip to content
Advertisement

Matplotlib: How to plot errorbar plots based on a color map of third category column(Not X and Y)

I’m working on an experimentation personal project. I have the following dataframes:

treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
                    ,'diff_pct': [0.655280, 0.127299, 0.229958, 0.613308, -0.718421]
                    ,'me_pct': [1.206313, 0.182875, 0.170821, 1.336590, 2.229763]
                    ,'p': [0.287025, 0.172464, 0.008328, 0.368466, 0.527718]
                    ,'significance': ['insignificant', 'insignificant', 'significant', 'insignificant', 'insignificant']})

pre_treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
                    ,'diff_pct': [0.137174, 0.111005, 0.169490, -0.152929, -0.450667]
                    ,'me_pct': [1.419080, 0.207081, 0.202014, 1.494588, 1.901672]
                    ,'p': [0.849734, 0.293427, 0.100091, 0.841053, 0.642303]
                    ,'significance': ['insignificant', 'insignificant', 'insignificant', 'insignificant', 'insignificant']})

I have used the below code to construct errorbar plot, which works fine:

def confint_plot(df):
  plt.style.use('fivethirtyeight')
  fig, ax = plt.subplots(figsize=(18, 10))

  plt.errorbar(df[df['significance'] == 'significant']["diff_pct"], df[df['significance'] == 'significant']["kpi"], xerr = df[df['significance'] == 'significant']["me_pct"], color = '#d62828', fmt = 'o', capsize = 10)
  plt.errorbar(df[df['significance'] == 'insignificant']["diff_pct"], df[df['significance'] == 'insignificant']["kpi"], xerr = df[df['significance'] == 'insignificant']["me_pct"], color =  '#2a9d8f', fmt = 'o', capsize = 10)
  plt.legend(['significant', 'insignificant'], loc = 'best')
  
  ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
             linestyle = '--', ymin=0.0, ymax=1)
  
  plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
  plt.xlabel("% Difference of Control over Treatment", size=12)
  
  plt.show()

for which the output of confint_plot(treat_repr) looks like this: enter image description here

Now if I run the same plot function on a pre-treatment dataframe confint_plot(pre_treat_repr), the plot looks like this:

enter image description here

We can observe from both the plots that the order of the variables changed from 1st plot to 2nd plot depending on whether the kpi is significant(that’s the way I figured after exhausting many attempts).

Questions:

  1. How do I make a change to the code to dynamically allocate color maps without changing the order of the kpis on y axis?

  2. Currently I have manually typed in the legends. Is there a way to dynamically populate legends?

Appreciate the help!

Advertisement

Answer

Because you plot the significant KPIs first, they will always appear on the bottom of the chart. How you solve this and keep the desired colors depends on the kind of charts you are making with matplotlib. With scatter charts, you can specify a color array in c parameter. Error bar charts do not offer that functionality.

One way to work around that is to sort your KPIs, give them numeric position (0, 1, 2, 3 , …), plot them twice (once for significants, once for insignificants) and re-tick them:

def confint_plot(df):
  plt.style.use('fivethirtyeight')
  fig, ax = plt.subplots(figsize=(18, 10))
  
  # Sort the KPIs alphabetically. You can change the order to anything
  # that fits your purpose
  df_plot = df.sort_values('kpi').assign(y=range(len(df)))

  for significance in ['significant', 'insignificant']:
    cond = df_plot['significance'] == significance
    color = '#d62828' if significance == 'significant' else '#2a9d8f'

    # Plot them in their numeric positions first
    plt.errorbar(
      df_plot.loc[cond, 'diff_pct'], df_plot.loc[cond, 'y'],
      xerr=df_plot.loc[cond, 'me_pct'], label=significance,
      fmt='o', capsize=10, c=color
    )

  plt.legend(loc='best')
  ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
             linestyle = '--', ymin=0.0, ymax=1)

  # Re-tick to show the KPIs
  plt.yticks(df_plot['y'], df_plot['kpi'])
  
  plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
  plt.xlabel("% Difference of Control over Treatment", size=12)
  
  plt.show()
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement