Here’s the piece of the code:
JavaScript
x
12
12
1
from sklearn.model_selection import StratifiedKFold
2
from sklearn.linear_model import LogisticRegressionCV
3
4
skf = StratifiedKFold(n_splits=5)
5
skf_1 = skf.split(titanic_dataset, surv_titanic)
6
7
ls_1 = np.logspace(-1.0, 2.0, num=500)
8
9
clf = LogisticRegressionCV(Cs=ls_1, cv = skf_1, scoring = "roc_auc", n_jobs=-1, random_state=17)
10
11
clf_model = clf.fit(x_train, y_train)
12
This says:
JavaScript
1
16
16
1
---------------------------------------------------------------------------
2
3
ValueError Traceback (most recent call last)
4
5
<ipython-input-130-b99a5912ff5a> in <module>
6
----> 1 clf_model = clf.fit(x_train, y_train)
7
8
H:Anaconda_3libsite-packagessklearnlinear_model_logistic.py in fit(self, X, y, sample_weight)
9
2098 # (n_classes, n_folds, n_Cs . n_l1_ratios) or
10
2099 # (1, n_folds, n_Cs . n_l1_ratios)
11
-> 2100 coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
12
2101 self.Cs_ = Cs[0]
13
2102 if multi_class == 'multinomial':
14
15
ValueError: not enough values to unpack (expected 4, got 0)
16
The train and test datasets had been prepared before, and they behave nicely with other classifiers.
Such a generic error message tells me nothing. What is the problem here?
Advertisement
Answer
In short, the issue was that you passed the result of skf.split(titanic_dataset, surv_titanic)
to the cv
argument on LogisticRegressionCV
when you needed to pass StratifiedKFold(n_splits=5)
directly instead.
Below I show the code that reproduced your error, and below that I show two alternative methods that accomplish what I believe you were trying to do.
JavaScript
1
48
48
1
# Some example data
2
data = load_breast_cancer()
3
X = data['data']
4
y = data['target']
5
6
# Set up the stratifiedKFold
7
skf = StratifiedKFold(n_splits=5)
8
9
# Don't do this... only here to reproduce the error
10
skf_indicies = skf.split(X, y)
11
12
# Some regularization
13
ls_1 = np.logspace(-1.0, 2.0, num=5)
14
15
# This creates your error
16
clf_error = LogisticRegressionCV(Cs=ls_1,
17
cv = skf_indicies,
18
scoring = "roc_auc",
19
n_jobs=-1,
20
random_state=17)
21
22
# Error created by passing result of skf.split to cv
23
clf_model = clf_error.fit(X, y)
24
25
# This is probably what you meant to do
26
clf_using_skf = LogisticRegressionCV(Cs=ls_1,
27
cv = skf,
28
scoring = "roc_auc",
29
n_jobs=-1,
30
random_state=17,
31
max_iter=1_000)
32
33
# This will now fit without the error
34
clf_model_skf = clf_using_skf.fit(X, y)
35
36
# This is the easiest method, and from the docs also does the
37
# same thing as StratifiedKFold
38
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
39
clf_easiest = LogisticRegressionCV(Cs=ls_1,
40
cv = 5,
41
scoring = "roc_auc",
42
n_jobs=-1,
43
random_state=17,
44
max_iter=1_000)
45
46
# This will now fit without the error
47
clf_model_easiest = clf_easiest.fit(X, y)
48