In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing? [closed]

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. This question does not appear to be about programming within the scope defined in the help center. Closed 2 years ago. Improve this question In the sklearn.linear_model.LinearRegression method, there is a parameter that is fit_intercept = TRUE or fit_intercept = FALSE. I am wondering if

Accepted Answer

fit_intercept=False sets the y-intercept to 0. If fit_intercept=True, the y-intercept will be determined by the line of best fit.from sklearn.linear_model import LinearRegressionfrom sklearn.datasets import make_regressionimport numpy as npimport matplotlib.pyplot as pltbias = 100X = np.arange(1000).reshape(-1,1)y_true = np.ravel(X.dot(0.3) + bias)noise = np.random.normal(0, 60, 1000)y = y_true + noiselr_fi_true = LinearRegression(fit_intercept=True)lr_fi_false = LinearRegression(fit_intercept=False)lr_fi_true.fit(X, y)lr_fi_false.fit(X, y)print('Intercept when fit_intercept=True : {:.5f}'.format(lr_fi_true.intercept_))print('Intercept when fit_intercept=False : {:.5f}'.format(lr_fi_false.intercept_))lr_fi_true_yhat = np.dot(X, lr_fi_true.coef_) + lr_fi_true.intercept_lr_fi_false_yhat = np.dot(X, lr_fi_false.coef_) + lr_fi_false.intercept_plt.scatter(X, y, label='Actual points')plt.plot(X, lr_fi_true_yhat, 'r--', label='fit_intercept=True')plt.plot(X, lr_fi_false_yhat, 'r-', label='fit_intercept=False')plt.legend()plt.vlines(0, 0, y.max())plt.hlines(bias, X.min(), X.max())plt.hlines(0, X.min(), X.max())plt.show()This example prints:Intercept when fit_intercept=True : 100.32210Intercept when fit_intercept=False : 0.00000Visually it becomes clear what fit_intercept does. When fit_intercept=True, the line of best fit is allowed to &#8220;fit&#8221; the y-axis (close to 100 in this example). When fit_intercept=False, the intercept is forced to the origin (0, 0).  What happens if I include a column of ones or zeros and set fit_intercept to True or False?Below shows an example of how to inspect this.from sklearn.linear_model import LinearRegressionfrom sklearn.datasets import make_regressionimport numpy as npimport matplotlib.pyplot as pltnp.random.seed(1)bias = 100X = np.arange(1000).reshape(-1,1)y_true = np.ravel(X.dot(0.3) + bias)noise = np.random.normal(0, 60, 1000)y = y_true + noise# with column of onesX_with_ones = np.hstack((np.ones((X.shape[0], 1)), X))for b,data in ((True, X), (False, X), (True, X_with_ones), (False, X_with_ones)):  lr = LinearRegression(fit_intercept=b)  lr.fit(data, y)  print(lr.intercept_, lr.coef_)Take-away:# fit_intercept=True, no column of zeros or ones104.156765787 [ 0.29634031]# fit_intercept=False, no column of zeros or ones0.0 [ 0.45265361]# fit_intercept=True, column of zeros or ones104.156765787 [ 0.          0.29634031]# fit_intercept=False, column of zeros or ones0.0 [ 104.15676579    0.29634031]

Advertisement

Answer