Sum the predictions of a Linear Regression from Scikit-Learn

I need to make a linear regression and sum all the predictions. Maybe this isn’t a question for Scikit-Learn but for NumPy because I get an array at the end and I am unable to turn it into a float.

df
  rank  Sales
0   1   18000
1   2   17780
2   3   17870
3   4   17672
4   5   17556

x = df['rank'].to_numpy()
y = df['Sales'].to_numpy()
X = x.reshape(-1,1)

regression = LinearRegression().fit(X, y)

JavaScript
​x
 
df
  rank  Sales
0   1   18000
1   2   17780
2   3   17870
3   4   17672
4   5   17556
​
x = df['rank'].to_numpy()
y = df['Sales'].to_numpy()
X = x.reshape(-1,1)
​
regression = LinearRegression().fit(X, y)
​

I am getting it right up to this point. The next part (which is a while loop to sum all the values) is not working:

number_predictions = 100
x_current_prediction = 1
total_sales = 0
while x_current_prediction <= number_predictions:
   variable_sum = x_current_prediction*regression.coef_
   variable_sum_float = variable_sum.astype(np.float_)
   total_sales = total_sales + variable_sum_float
   x_current_prediction =+1
return total_sales

JavaScript
 
number_predictions = 100
x_current_prediction = 1
total_sales = 0
while x_current_prediction <= number_predictions:
   variable_sum = x_current_prediction*regression.coef_
   variable_sum_float = variable_sum.astype(np.float_)
   total_sales = total_sales + variable_sum_float
   x_current_prediction =+1
return total_sales
​

I think that the problem is getting regression.coef_ to be a float, but when I use astype, it does not work?

Answer

You don’t need to loop like this, and you don’t need to use the coefficient to compute the prediction (don’t forget there may be an intercept as well).

Instead, make an array of all the values of x you want to predict for, and ask sklearn for the predictions:

X_new = np.arange(1, 101).reshape(-1, 1)  # X must be 2D.
y_pred = regression.predict(X_new)

JavaScript
 
X_new = np.arange(1, 101).reshape(-1, 1)  # X must be 2D.
y_pred = regression.predict(X_new)
​

If you want to add all these numbers together, use y_pred.sum() or np.sum(y_pred), or if you want a cumulative sum, np.cumsum(y_pred) will do it.

Advertisement

Answer