Skip to content
Advertisement

scikit-learn LinearRegression IndexError

I am working on a LinearRegression model to fill the null values for the feature Rupeepersqft. When I run the code, I am receiving this error:

IndexError                                Traceback (most recent call last)
<ipython-input-20-33d4e6d2998e> in <module>()
      1 test_data = data_with_null.iloc[:,:3]
----> 2 Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

This is the code which gives me the error:

from sklearn.linear_model import LinearRegression
linreg = LinearRegression()

data_with_null = data2[['Price (Lakhs)','Area','Area Type','Rupeepersqft','Condition','Purchase Type','Real Estate Regulation Act']].dropna()
data_without_null =  data_with_null.dropna()

train_data_x = data_without_null.iloc[:,:3]
train_data_y = data_without_null.iloc[:,3]

linreg.fit(train_data_x, train_data_y)

test_data = data_with_null.iloc[:,:3]
Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))

data_with_null.Rupeepersqft.fillna(Rupeepersqft_predicted, inplace=True)

This is how the data looks like:

Data2

Can anyone help me out with this?

Advertisement

Answer

To assign values to a column in Pandas.DataFrame you should use the locators, i.e., loc and iloc (for array-like manipulations), so to fix your issue try changing the

Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))

to:

Rupeepersqft_predicted.loc[:, 'Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))

which will chose all the rows (the :), and the column Rupeepersqft, and assign whatever values you have on the right.

or by using the iloc:

Rupeepersqft_predicted.iloc[:, 1] = pd.DataFrame(linreg.predict(test_data))

to assign it to the all rows (again by : operator) of the 1st column of the DataFrame.

Just make sure the values on the right are of the same length as the column you try to assign it to.

More on Pandas you can find in this book.

Cheers

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement