How to do linear regression, taking errorbars into account?

Question

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear regression. The data I have is noisy, but for each data point I can estimate errorbars. So, for example

Accepted Answer

Not entirely sure if this is what you mean, but…using pandas, statsmodels, and patsy, we can compare an ordinary least-squares fit and a weighted least-squares fit which uses the inverse of the noise you provided as a weight matrix (statsmodels will complain about sample sizes < 20, by the way).import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport matplotlib as mplmpl.rcParams['figure.dpi'] = 300import statsmodels.formula.api as smx_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]# put x and y into a pandas DataFrame, and the weights into a Seriesws = pd.DataFrame({    'x': x_list,    'y': y_list})weights = pd.Series(y_err)wls_fit = sm.wls('x ~ y', data=ws, weights=1 / weights).fit()ols_fit = sm.ols('x ~ y', data=ws).fit()# show the fit summary by calling wls_fit.summary()# wls fit r-squared is 0.754# ols fit r-squared is 0.701# let's plot our dataplt.clf()fig = plt.figure()ax = fig.add_subplot(111, facecolor='w')ws.plot(    kind='scatter',    x='x',    y='y',    style='o',    alpha=1.,    ax=ax,    title='x vs y scatter',    edgecolor='#ff8300',    s=40)# weighted predictionwp, = ax.plot(    wls_fit.predict(),    ws['y'],    color='#e55ea2',    lw=1.,    alpha=1.0,)# unweighted predictionop, = ax.plot(      ols_fit.predict(),    ws['y'],    color='k',    ls='solid',    lw=1,    alpha=1.0,)leg = plt.legend(    (op, wp),    ('Ordinary Least Squares', 'Weighted Least Squares'),    loc='upper left',    fontsize=8)plt.tight_layout()fig.set_size_inches(6.40, 5.12)plt.show()WLS residuals:[0.025624005084707302, 0.013611438189866154, -0.033569595462217161, 0.044110895217014695, -0.025071632845910546, -0.036308252199571928, -0.010335514810672464, -0.0081511479431851663]The mean squared error of the residuals for the weighted fit (wls_fit.mse_resid or wls_fit.scale) is 0.22964802498892287, and the r-squared value of the fit is 0.754.You can obtain a wealth of data about the fits by calling their summary() method, and/or doing dir(wls_fit), if you need a list of every available property and method.

Advertisement

Answer