Can´t optimize my hyperparamters using gridsearch. Why does this not work with continous input? Alternatives?

I got a matrix Z (3000*2000), where each row describes a sample. Each column describes a single feature which is a nucleotide (A,G,T,C) and I have standardized the data so that each column contains only 0 and 1. The matrix then looks like this:

[[1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 [0 1 0 ... 1 1 1]
 ...
 [1 0 1 ... 0 1 1]
 [1 1 0 ... 1 1 1]
 [1 1 1 ... 1 1 0]]

And y looks like this:

[
'6484321.23'
'9646585.73'
'2346813.11' 
... 
'8369179.01'
'6200894.94'
'7927300.10']

I tried this to do a Supportvector machine

import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR

grid = {"C": np.logspace(-5,5,10), "gamma": np.logspace(-5,5,10)}

cv = KFold(n_splits=10)


for i,j in cv.split(Z,y):
    Z1 = Z[i]
    Z2 = Z[j]
    y1 = y[i]
    y2 = y[j]
    
    supportvectorrregression = SVR(kernel="rbf")
    gridsearch = GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False)
    gridsearch.fit(Z1,y1)
    scores = gridsearch.decision_function(Z2)

And now i got this error:

ValueError                                Traceback (most recent call last)

     10     supportvectorrregression = SVR(kernel="rbf")
     11     gridsearch = GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False)
---> 12     gridsearch.fit(Z1,y1)


ValueError: continuous is not supported

Why is continous data here not supported? What can i do?

Answer

I think the problem is within this line :

GridSearchCV(supportvectorrregression, grid,cv=2, scoring="accuracy", iid=False)

You choose scoring = "accuracy" but it seems your model is a regression problem. So it probably tells you it cannot compute accuracy on continuous output prediction.

Maybe you can try by swapping with another appropriate metrics of your choice: scikitlearn doc metrics.

Advertisement

Answer