I’m working on an experimentation personal project. I have the following dataframes:
treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet'] ,'diff_pct': [0.655280, 0.127299, 0.229958, 0.613308, -0.718421] ,'me_pct': [1.206313, 0.182875, 0.170821, 1.336590, 2.229763] ,'p': [0.287025, 0.172464, 0.008328, 0.368466, 0.527718] ,'significance': ['insignificant', 'insignificant', 'significant', 'insignificant', 'insignificant']}) pre_treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet'] ,'diff_pct': [0.137174, 0.111005, 0.169490, -0.152929, -0.450667] ,'me_pct': [1.419080, 0.207081, 0.202014, 1.494588, 1.901672] ,'p': [0.849734, 0.293427, 0.100091, 0.841053, 0.642303] ,'significance': ['insignificant', 'insignificant', 'insignificant', 'insignificant', 'insignificant']})
I have used the below code to construct errorbar plot, which works fine:
def confint_plot(df): plt.style.use('fivethirtyeight') fig, ax = plt.subplots(figsize=(18, 10)) plt.errorbar(df[df['significance'] == 'significant']["diff_pct"], df[df['significance'] == 'significant']["kpi"], xerr = df[df['significance'] == 'significant']["me_pct"], color = '#d62828', fmt = 'o', capsize = 10) plt.errorbar(df[df['significance'] == 'insignificant']["diff_pct"], df[df['significance'] == 'insignificant']["kpi"], xerr = df[df['significance'] == 'insignificant']["me_pct"], color = '#2a9d8f', fmt = 'o', capsize = 10) plt.legend(['significant', 'insignificant'], loc = 'best') ax.axvline(0, c='red', alpha=0.5, linewidth=3.0, linestyle = '--', ymin=0.0, ymax=1) plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold') plt.xlabel("% Difference of Control over Treatment", size=12) plt.show()
for which the output of confint_plot(treat_repr)
looks like this:
Now if I run the same plot function on a pre-treatment dataframe confint_plot(pre_treat_repr)
, the plot looks like this:
We can observe from both the plots that the order of the variables changed from 1st plot to 2nd plot depending on whether the kpi is significant(that’s the way I figured after exhausting many attempts).
Questions:
How do I make a change to the code to dynamically allocate color maps without changing the order of the kpis on y axis?
Currently I have manually typed in the legends. Is there a way to dynamically populate legends?
Appreciate the help!
Advertisement
Answer
Because you plot the significant KPIs first, they will always appear on the bottom of the chart. How you solve this and keep the desired colors depends on the kind of charts you are making with matplotlib. With scatter charts, you can specify a color array in c
parameter. Error bar charts do not offer that functionality.
One way to work around that is to sort your KPIs, give them numeric position (0, 1, 2, 3 , …), plot them twice (once for significants, once for insignificants) and re-tick them:
def confint_plot(df): plt.style.use('fivethirtyeight') fig, ax = plt.subplots(figsize=(18, 10)) # Sort the KPIs alphabetically. You can change the order to anything # that fits your purpose df_plot = df.sort_values('kpi').assign(y=range(len(df))) for significance in ['significant', 'insignificant']: cond = df_plot['significance'] == significance color = '#d62828' if significance == 'significant' else '#2a9d8f' # Plot them in their numeric positions first plt.errorbar( df_plot.loc[cond, 'diff_pct'], df_plot.loc[cond, 'y'], xerr=df_plot.loc[cond, 'me_pct'], label=significance, fmt='o', capsize=10, c=color ) plt.legend(loc='best') ax.axvline(0, c='red', alpha=0.5, linewidth=3.0, linestyle = '--', ymin=0.0, ymax=1) # Re-tick to show the KPIs plt.yticks(df_plot['y'], df_plot['kpi']) plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold') plt.xlabel("% Difference of Control over Treatment", size=12) plt.show()