Skip to content
Advertisement

Confuse why my KNN code is throwing a ValueError

I am using sklearn for KNN regressor:

#importing libraries and data
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor as KNR
theta = pd.read_csv("train.csv")#pandas dataframe
#getting data wanted from theta and putting it in a new dataframe
a = theta.get("YearBuilt")
b = theta.get("YrSold")
A = a.to_frame()
B = b.to_frame()
glasses = [A,B]
x = pd.concat(glasses)
#getting target data
y = theta.get("SalePrice")
#using KNN
horses = KNR(n_neighbors = 3)
horses.fit(x,y)

I get this error message:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could someone please explain this? My data is in the hundred thousands for target and the thousands for input. And there is no blanks in the data.

Advertisement

Answer

Before answering the question, Let me refactor the code. You are using a dataframe so you can index single or muliple fields of the dataframe without going through the extra steps you’ve used:

#importing libraries and data
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor as KNR

theta = pd.read_csv("train.csv") # pandas dataframe
#getting data wanted from theta and putting it in a new dataframe
x = theta[["YearBuilt", "YrSold"]] # index multiple fields
#getting target data
y = theta["SalePrice"] # index single field
#using KNN
horses = KNR(n_neighbors = 3)
horses.fit(x,y) # fit KNN

Regarding your error, it indicates that you have some NaN, Inf, large values in your data. You can ensure these doesnt occur by filtering out the NaN and inf values using this:

theta = theta.replace([np.inf, -np.inf], np.nan)

theta.dropna(inplace=True)
Advertisement