I need to make a linear regression and sum all the predictions. Maybe this isn’t a question for Scikit-Learn but for NumPy because I get an array at the end and I am unable to turn it into a float.
df rank Sales 0 1 18000 1 2 17780 2 3 17870 3 4 17672 4 5 17556 x = df['rank'].to_numpy() y = df['Sales'].to_numpy() X = x.reshape(-1,1) regression = LinearRegression().fit(X, y)
I am getting it right up to this point. The next part (which is a while loop to sum all the values) is not working:
number_predictions = 100 x_current_prediction = 1 total_sales = 0 while x_current_prediction <= number_predictions: variable_sum = x_current_prediction*regression.coef_ variable_sum_float = variable_sum.astype(np.float_) total_sales = total_sales + variable_sum_float x_current_prediction =+1 return total_sales
I think that the problem is getting regression.coef_
to be a float, but when I use astype
, it does not work?
Advertisement
Answer
You don’t need to loop like this, and you don’t need to use the coefficient to compute the prediction (don’t forget there may be an intercept as well).
Instead, make an array of all the values of x
you want to predict for, and ask sklearn
for the predictions:
X_new = np.arange(1, 101).reshape(-1, 1) # X must be 2D. y_pred = regression.predict(X_new)
If you want to add all these numbers together, use y_pred.sum()
or np.sum(y_pred)
, or if you want a cumulative sum, np.cumsum(y_pred)
will do it.