I am trying to write an SVM following this tutorial but using my own data. https://pythonprogramming.net/preprocessing-machine-learning/?completed=/linear-svc-machine-learning-testing-data/
I keep getting this error:
ValueError: The number of classes has to be greater than one; got 1
My code is:
header1 = ["Number of Sides", "Standard Deviation of Number of Sides/Perimeter", "Standard Deviation of the Angles", "Largest Angle"] header2 = ["Label"] features = header1 features1 = header2 def Build_Data_Set(): data_df = pd.DataFrame.from_csv("featureVectors.csv") #data_df = data_df[:3] X = np.array(data_df[features].values) data_df2 = pd.DataFrame.from_csv("labels.csv") y = np.array(data_df2[features1].replace("Circle",0).replace("Triangle",1) .replace("Square",2).replace("Parallelogram",3) .replace("Rectangle",4).values.tolist()) return X,y def Analysis(): test_size = 4 X,y = Build_Data_Set() print(len(X)) clf = svm.SVC(kernel = 'linear', C = 1.0) clf.fit(X[:-test_size],y[:-test_size]) correct_count = 0 for x in range(1, test_size+1): if clf.predict(X[-x])[0] == y[-x]: correct_count += 1 print("Accuracy:", (correct_count/test_size) * 100.00)
My array for features which is used for X looks like this:
[[4, 0.001743713493735165, 0.6497055601752815, 90.795723552739275], [4, 0.0460937435599832, 0.19764217920409227, 90.204147248752378], [1, 0.001185534503063044, 0.3034913722821194, 60.348908179729023], [1, 0.015455289770298222, 0.8380914254332884, 109.02120657826231], [3, 0.0169961646358455, 0.2458746325894564, 136.83829993466398]]
My array for labels used in Y looks like this:
['Square', 'Square', 'Circle', 'Circle', 'Triangle']
I have only used 5 sets of data so far because I knew the program wasn’t working.
I have attached pictures of the values in their csv files in case that helps.
Printing X.shape and y.shape and showing the full error
Advertisement
Answer
Looks to me like the problem is this line:
clf.fit(X[:-test_size],y[:-test_size])
Since X has 5 rows, and you’ve set test_size to 4, X[:-test_size] only gives one row (the first one). Read up on python’s slice notation, if this confuses you: Explain Python’s slice notation
So there is only one class in the training set (“Square” in this case). I wonder if you meant to do X[:test_size]
which would give the first 4 rows. Anyway, try training on a bigger data set.
I can reproduce your error with the following:
import numpy as np from sklearn import svm X = np.array([[4, 0.001743713493735165, 0.6497055601752815, 90.795723552739275], [4, 0.0460937435599832, 0.19764217920409227, 90.204147248752378], [1, 0.001185534503063044, 0.3034913722821194, 60.348908179729023], [1, 0.015455289770298222, 0.8380914254332884, 109.02120657826231], [3, 0.0169961646358455, 0.2458746325894564, 136.83829993466398]]) y = np.array(['Square', 'Square', 'Circle', 'Circle', 'Triangle']) print X.shape # (5,4) print y.shape # (5,) clf = svm.SVC(kernel='linear',C=1.0) test_size = 4 clf.fit(X[:-test_size],y[:-test_size])