Is there any way I can fit two independent variables and one dependent variable in numpy.polyfit()?
I have a panda data frame that I loaded from a csv file. I wish to include two columns as independent variables to run multiple linear regression using NumPy.
Currently my simple linear regression looks like this:
model_combined = np.polyfit(data.Exercise, y, 1)
I wish to include data.Age
in x as well.
Advertisement
Answer
Assuming your equation is a * exercise + b * age + intercept = y
, you can fit a multiple linear regression with numpy or scikit-learn as follows:
from sklearn import linear_model import numpy as np np.random.seed(42) X = np.random.randint(low=1, high=10, size=20).reshape(10, 2) X = np.c_[X, np.ones(X.shape[0])] # add intercept y = np.random.randint(low=1, high=10, size=10) # Option 1 a, b, intercept = np.linalg.pinv((X.T).dot(X)).dot(X.T.dot(y)) print(a, b, intercept) # Option 2 a, b, intercept = np.linalg.lstsq(X,y, rcond=None)[0] print(a, b, intercept) # Option 3 clf = linear_model.LinearRegression(fit_intercept=False) clf.fit(X, y) print(clf.coef_)