I have points with x and y coordinates I want to fit a straight line to with Linear Regression but I get a jagged looking line.
I am attemting to use LinearRegression from sklearn.
To create the points run a for loop that randomly crates one hundred points into an array that is 100 x 2 in shape. I slice the left side of it for the xs and the right side of it for the ys.
I expect to have a straight line when I print m.predict
.
import numpy as np import matplotlib.pyplot as plt import random from sklearn.linear_model import LinearRegression X = [] adder = 0 for z in range(100): r = random.random() * 20 r2 = random.random() * 15 X.append([r+adder-0.4, r2+adder]) adder += 0.6 X = np.array(X) plt.scatter(X[:,0], X[:,1], s=10) plt.show()
m = LinearRegression() m.fit(X[:,0].reshape(1, -1), X[:,1].reshape(1, -1)) plt.plot(m.predict(X[:,0].reshape(1, -1))[0])
Advertisement
Answer
I am not good with numpy but, I think it is because the use of reshape()
function to convert X[:,0]
and X[:,1]
from 1D to 2D, the resulting 2D array contains only one element, instead of creating a 2D array of len(X[:,0])
and len(X[:,1])
respectively. And resulting into an undesired regressor.
I am able to recreate this model using pandas and able to plot the desired result. Code as follows
import numpy as np import matplotlib.pyplot as plt import random from sklearn.linear_model import LinearRegression import pandas as pd X = [] adder = 0 for z in range(100): r = random.random() * 20 r2 = random.random() * 15 X.append([r+adder-0.4, r2+adder]) adder += 0.6 X = np.array(X) y_train = pd.DataFrame(X[:,1],columns=['y']) X_train = pd.DataFrame(X[:,0],columns=['X']) //plt.scatter(X_train, y_train, s=10) //plt.show() m = LinearRegression() m.fit(X_train, y_train) plt.scatter(X_train,y_train) plt.plot(X_train,m.predict(X_train),color='red')