I have a binary classification problem. I’ve been using cross validation
to optimize the ElasticNet
parameters. However ElasticNet only seems to work when I supply roc_auc
as the scoring method to be used during CV, However I also want to test out a wide range of scoring methods, in particular accuracy
. Specifically, when using accuracy, ElasticNet returns this error:
ValueError: Classification metrics can't handle a mix of binary and continuous targets
However my y
targets are indeed binary. Below is a replication of my problem using the dataset from here:
import numpy as np import pandas as pd from sklearn.preprocessing import LabelBinarizer from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold from sklearn.metrics import make_scorer, recall_score, accuracy_score, precision_score, confusion_matrix from sklearn.linear_model import LogisticRegression from sklearn.linear_model import ElasticNet data = pd.read_csv('data 2.csv') # by default majority class (benign) will be negative lb = LabelBinarizer() data['diagnosis'] = lb.fit_transform(data['diagnosis'].values) targets = data['diagnosis'] data.drop(['id', 'diagnosis', 'Unnamed: 32'], axis=1, inplace=True) X_train, X_test, y_train, y_test = train_test_split(data, targets, stratify=targets) #elastic net logistic regression lr = ElasticNet(max_iter=2000) scorer = 'accuracy' param_grid = { 'alpha': [1e-4, 1e-3, 1e-2, 0.01, 0.1, 1, 5, 10], 'l1_ratio': np.arange(0.2, 0.9, 0.1) } skf = StratifiedKFold(n_splits=10) clf = GridSearchCV(lr, param_grid, scoring=scorer, cv=skf, return_train_score=True, n_jobs=-1) clf.fit(X_train.values, y_train.values)
I figured that ElasticNet might be trying to solve a linear regression problem so I tried lr = LogisticRegression(penalty='elasticnet', l1_ratios=[0.1, 0.5, 0.9], solver='saga')
as the classifier but the same problem persists.
If I use as the scoring metric scorer = 'roc_auc'
then the model is built as expected.
Also, as a sanity to check to see if there is something wrong with the data I tried the same but with a random forest classifier and here the problem disappears:
# random forest clf = RandomForestClassifier(n_jobs=-1) param_grid = { 'min_samples_split': [3, 5, 10], 'n_estimators' : [100, 300], 'max_depth': [3, 5, 15, 25], 'max_features': [3, 5, 10, 20] } skf = StratifiedKFold(n_splits=10) scorer = 'accuracy' grid_search = GridSearchCV(clf, param_grid, scoring=scorer, cv=skf, return_train_score=True, n_jobs=-1) grid_search.fit(X_train.values, y_train.values)
Has anyone got any ideas on what’s happening here?
Advertisement
Answer
ElasticNet
is a regression model.
If you want an ElasticNet
penalty in classification, use LogisticRegression
:
lr = LogisticRegression(solver="saga", penalty="elasticnet")
Minimal Reproducible Example:
import numpy as np from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) lr = LogisticRegression(solver="saga", penalty="elasticnet", max_iter=2000) param_grid = { 'l1_ratio': np.arange(0.2, 0.9, 0.1) } clf = GridSearchCV(lr, param_grid, scoring='accuracy', cv=StratifiedKFold(n_splits=10), return_train_score=True, n_jobs=-1) clf.fit(X_train, y_train) print(clf.score(X_test, y_test))