I am using sklearn for KNN regressor:
#importing libraries and data import pandas as pd from sklearn.neighbors import KNeighborsRegressor as KNR theta = pd.read_csv("train.csv")#pandas dataframe #getting data wanted from theta and putting it in a new dataframe a = theta.get("YearBuilt") b = theta.get("YrSold") A = a.to_frame() B = b.to_frame() glasses = [A,B] x = pd.concat(glasses) #getting target data y = theta.get("SalePrice") #using KNN horses = KNR(n_neighbors = 3) horses.fit(x,y)
I get this error message:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Could someone please explain this? My data is in the hundred thousands for target and the thousands for input. And there is no blanks in the data.
Advertisement
Answer
Before answering the question, Let me refactor the code. You are using a dataframe so you can index single or muliple fields of the dataframe without going through the extra steps you’ve used:
#importing libraries and data import pandas as pd from sklearn.neighbors import KNeighborsRegressor as KNR theta = pd.read_csv("train.csv") # pandas dataframe #getting data wanted from theta and putting it in a new dataframe x = theta[["YearBuilt", "YrSold"]] # index multiple fields #getting target data y = theta["SalePrice"] # index single field #using KNN horses = KNR(n_neighbors = 3) horses.fit(x,y) # fit KNN
Regarding your error, it indicates that you have some NaN
, Inf
, large values in your data. You can ensure these doesnt occur by filtering out the NaN
and inf
values using this:
theta = theta.replace([np.inf, -np.inf], np.nan) theta.dropna(inplace=True)