I got a matrix Z (3000*2000), where each row describes a sample. Each column describes a single feature which is a nucleotide (A,G,T,C) and I have standardized the data so that each column contains only 0 and 1. The matrix then looks like this:
[[1 1 1 ... 1 1 1] [1 1 1 ... 1 1 1] [0 1 0 ... 1 1 1] ... [1 0 1 ... 0 1 1] [1 1 0 ... 1 1 1] [1 1 1 ... 1 1 0]]
And y looks like this:
[ '6484321.23' '9646585.73' '2346813.11' ... '8369179.01' '6200894.94' '7927300.10']
I tried this to do a Supportvector machine
import numpy as np from sklearn.model_selection import KFold from sklearn.model_selection import GridSearchCV from sklearn.svm import SVR grid = {"C": np.logspace(-5,5,10), "gamma": np.logspace(-5,5,10)} cv = KFold(n_splits=10) for i,j in cv.split(Z,y): Z1 = Z[i] Z2 = Z[j] y1 = y[i] y2 = y[j] supportvectorrregression = SVR(kernel="rbf") gridsearch = GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False) gridsearch.fit(Z1,y1) scores = gridsearch.decision_function(Z2)
And now i got this error:
ValueError Traceback (most recent call last) 10 supportvectorrregression = SVR(kernel="rbf") 11 gridsearch = GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False) ---> 12 gridsearch.fit(Z1,y1) ValueError: continuous is not supported
Why is continous data here not supported? What can i do?
Advertisement
Answer
I think the problem is within this line :
GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False)
You choose scoring = "accuracy"
but it seems your model is a regression problem. So it probably tells you it cannot compute accuracy on continuous output prediction.
Maybe you can try by swapping with another appropriate metrics of your choice: scikitlearn doc metrics.