How to create Predicted vs. Actual plot using abline_plot and statsmodels

Question

I am trying to recreate this plot from this website in Python instead of R: Background I have a dataframe called boston (the popular educational boston housing dataset). I created a multiple linear regression model with some variables with statsmodels api below. Everything works. I create a dataframe of actual values from the boston dataset and predicted values from above

Accepted Answer

That R plot is actually for predicted ~ actual, but your python code passes the medv ~ ... model into abline_plot.To recreate the R plot in python:either use statsmodels to manually fit a new predicted ~ actual model for abline_plotor use seaborn.regplot to do it automaticallyUsing statsmodelsIf you want to plot this manually, fit a new predicted ~ actual model and pass that model into abline_plot. Then, generate the confidence band using the summary_frame of the prediction results.import statsmodels.formula.api as smffrom statsmodels.graphics.regressionplots import abline_plot# fit prediction modelpred = smf.ols('predicted ~ actual', data=new_df).fit()# generate confidence intervalsummary = pred.get_prediction(new_df).summary_frame()summary['actual'] = new_df['actual']summary = summary.sort_values('actual')# plot predicted vs actualax = new_df.plot.scatter(x='actual', y='predicted', color='gray', s=10, alpha=0.5)# plot regression lineabline_plot(model_results=pred, ax=ax, color='orange')# plot confidence intervalax.fill_between(x=summary['actual'], y1=summary['mean_ci_lower'], y2=summary['mean_ci_upper'],                alpha=0.2, color='orange')Alternative to abline_plot, you can use matplotlib&#8217;s built-in axline by extracting the intercept and slope from the model&#8217;s params:# plot y=mx+b regression line using matplotlib's axlineb, m = pred.paramsax.axline(xy1=(0, b), slope=m, color='orange')Using seabornNote that it&#8217;s much simpler to let seaborn.regplot handle this automatically:import seaborn as snssns.regplot(data=new_df, x='actual', y='predicted',            scatter_kws=dict(color='gray', s=10, alpha=0.5),            line_kws=dict(color='orange'))With seaborn, it&#8217;s also trivial to plot a polynomial fit via the order param:sns.regplot(data=new_df, x='actual', y='predicted', order=2)

Advertisement

Answer

Using statsmodels

Using seaborn