Python: Develope Multiple Linear Regression Model From Scrath

Question

I am trying to create a multiple linear regression model from scratch in python. Dataset used: Boston Housing Dataset from Sklearn. Since my focus was on the model building I did not perform any pre-processing steps on the data. However, I used an OLS model to calculate p-values and dropped 3 features from the data. After that, I used a

Accepted Answer

It seems like the trouble lies in the coefficient calculation. The formula you have given for calculating the coefficients is in scalar form, used for the simplest case of linear regression, namely with only one feature x. EDITNow after seeing your code for the coefficient calculation, the problem is clearer. You cannot use this equation to calculate the coefficients of each feature independent of each other, as each coefficient will depend on all the features. I suggest you take a look at the derivation of the solution to this least squares optimization problem in the simple case here and in the general case here. And as a general tip stick with matrix implementation whenever you can, as this is radically more efficient. However, in this case we have a 10-dimensional feature vector, and so in matrix notation it becomes. See derivation hereI suspect you made some computational error here, as implementing this in python using the scalar formula is more tedious and untidy than the matrix equivalent. But since you haven&#8217;t shared this peace of your code its hard to know. Here&#8217;s an example of how you would implement it:def calc_coefficients(X,Y):    X=np.mat(X)    Y = np.mat(Y)    return np.dot((np.dot(np.transpose(X),X))**(-1),np.transpose(np.dot(Y,X)))def score_r2(y_pred,y_true):    ss_tot=np.power(y_true-y_true.mean(),2).sum()    ss_res = np.power(y_true -y_pred,2).sum()    return 1 -ss_res/ss_totX = np.ones(shape=(506,11))X[:,1:] = data.valuesB=calc_coefficients(X,y)##### Coeffcients B[:]matrix([[ 2.26053646e+01],        [-9.64973063e-02],        [ 5.28108077e-02],        [ 2.38029890e+00],        [ 3.94059598e+00],        [-1.05476566e+00],        [ 2.82595310e-01],        [-1.57226536e-02],        [-7.56519964e-01],        [ 1.02392192e-02],        [-5.70698610e-01]])#### Intercept B[0]matrix([[22.60536463]])y_pred = np.dot(np.transpose(B),np.transpose(X))##### First 5 rows predictednp.array(y_pred)[0][:5]array([30.42657776, 24.80818347, 30.69339701, 29.35761397, 28.6004966 ])##### First 5 rows Ground Truthy[:5]array([24. , 21.6, 34.7, 33.4, 36.2])### R^2 scorescore_r2(y_pred,y)0.7278959820021539

Advertisement

Answer