I am working on a LinearRegression model to fill the null values for the feature Rupeepersqft
. When I run the code, I am receiving this error:
IndexError Traceback (most recent call last) <ipython-input-20-33d4e6d2998e> in <module>() 1 test_data = data_with_null.iloc[:,:3] ----> 2 Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data)) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
This is the code which gives me the error:
from sklearn.linear_model import LinearRegression linreg = LinearRegression() data_with_null = data2[['Price (Lakhs)','Area','Area Type','Rupeepersqft','Condition','Purchase Type','Real Estate Regulation Act']].dropna() data_without_null = data_with_null.dropna() train_data_x = data_without_null.iloc[:,:3] train_data_y = data_without_null.iloc[:,3] linreg.fit(train_data_x, train_data_y) test_data = data_with_null.iloc[:,:3] Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data)) data_with_null.Rupeepersqft.fillna(Rupeepersqft_predicted, inplace=True)
This is how the data looks like:
Can anyone help me out with this?
Advertisement
Answer
To assign values to a column in Pandas.DataFrame
you should use the locators
, i.e., loc
and iloc
(for array-like manipulations), so to fix your issue try changing the
Rupeepersqft_predicted['Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))
to:
Rupeepersqft_predicted.loc[:, 'Rupeepersqft'] = pd.DataFrame(linreg.predict(test_data))
which will chose all the rows (the :
), and the column Rupeepersqft
, and assign whatever values you have on the right.
or by using the iloc
:
Rupeepersqft_predicted.iloc[:, 1] = pd.DataFrame(linreg.predict(test_data))
to assign it to the all rows (again by :
operator) of the 1
st column of the DataFrame
.
Just make sure the values on the right are of the same length as the column you try to assign it to.
More on Pandas
you can find in this book.
Cheers