Skip to content
Advertisement

Python : y should be a 1d array, got an array of shape {} instead. format(shape)

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score,confusion_matrix
import pandas as pd
df=pd.read_csv('weather.csv',delimiter=',')
print(df)
x=df.values[:,0:df.shape[1]-1]
y=df.values[:,df.shape[1]-1]

x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.5,random_state=0)
gnb=GaussianNB()
y_pred=gnb.fit(x_train,y_train).predict(x_test)
print(y_test,y_pred)
print("Number of misplaced points out of a total %d points : %d" % (x_test.shape[0],y_test!=y.pred).sum())
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred)

The above is my code which I tried in Google Colab. But here it shows one error :

"y should be a 1d array, got an array of shape {} instead.".format(shape)"

This is error is shown in the line

y_pred=gnb.fit(x_train,y_train).predict(x_test)

Please help me to solve this error. I am a beginner so answer the question with elaboration

Advertisement

Answer

Your problem is that the outputs of train_test_split are ordered differently than you think.

train_test_split returns the split of the first argument first, then the split of the second argument. So instead you should use it like

x_train, x_test, y_test, y_test = train_test_split(x,y,test_size=0.5,random_state=0)

You can find more information and a few examples in the documentation.

You can resolve issues like that by inspecting the shapes of the values of your variables. Either use a debugger or print their shapes:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

data = np.random.rand(100, 5)  # some test data
df = pd.DataFrame(data)
x = df.values[:, :-1]  # you probably don't want to include the last column here?
y = dfvalues[:, -1]  # does the same as df.shape[1]-1

print(f"x shape: {x.shape}")  # (100, 4)
print(f"y shape: {y.shape}")  # (100,)  ==> 1d, fine

x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.5,random_state=0)


print(f"x_train shape: {x_train.shape}")  # (50, 4)
print(f"y_train shape: {y_train.shape}")  # (50, 4)  ==> 2d, so something is wrong
print(f"x_test shape: {x_test.shape}")  # (50,) => also bad
print(f"x_test shape: {y_test.shape}")  # (50,) => also bad

gnb=GaussianNB()
y_pred=gnb.fit(x_train,y_train).predict(x_test)  # error y should be 1d ...

Now you can see why the error is raised and you can see where things go wrong. Then you can lookup the documentation of the last command that produced unexpected outputs.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement