from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score,confusion_matrix import pandas as pd df=pd.read_csv('weather.csv',delimiter=',') print(df) x=df.values[:,0:df.shape[1]-1] y=df.values[:,df.shape[1]-1] x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.5,random_state=0) gnb=GaussianNB() y_pred=gnb.fit(x_train,y_train).predict(x_test) print(y_test,y_pred) print("Number of misplaced points out of a total %d points : %d" % (x_test.shape[0],y_test!=y.pred).sum()) print(accuracy_score(y_test,y_pred)) print(confusion_matrix(y_test,y_pred)
The above is my code which I tried in Google Colab. But here it shows one error :
"y should be a 1d array, got an array of shape {} instead.".format(shape)"
This is error is shown in the line
y_pred=gnb.fit(x_train,y_train).predict(x_test)
Please help me to solve this error. I am a beginner so answer the question with elaboration
Advertisement
Answer
Your problem is that the outputs of train_test_split
are ordered differently than you think.
train_test_split
returns the split of the first argument first, then the split of the second argument. So instead you should use it like
x_train, x_test, y_test, y_test = train_test_split(x,y,test_size=0.5,random_state=0)
You can find more information and a few examples in the documentation.
You can resolve issues like that by inspecting the shapes of the values of your variables. Either use a debugger or print their shapes:
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB data = np.random.rand(100, 5) # some test data df = pd.DataFrame(data) x = df.values[:, :-1] # you probably don't want to include the last column here? y = dfvalues[:, -1] # does the same as df.shape[1]-1 print(f"x shape: {x.shape}") # (100, 4) print(f"y shape: {y.shape}") # (100,) ==> 1d, fine x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.5,random_state=0) print(f"x_train shape: {x_train.shape}") # (50, 4) print(f"y_train shape: {y_train.shape}") # (50, 4) ==> 2d, so something is wrong print(f"x_test shape: {x_test.shape}") # (50,) => also bad print(f"x_test shape: {y_test.shape}") # (50,) => also bad gnb=GaussianNB() y_pred=gnb.fit(x_train,y_train).predict(x_test) # error y should be 1d ...
Now you can see why the error is raised and you can see where things go wrong. Then you can lookup the documentation of the last command that produced unexpected outputs.