I’m working with the Boston housing dataset from sklearn.datasets
and have run ridge and lasso regressions on my data (post train/test split). I’m now trying to perform k-fold cross validation to find the optimal penalty parameters, and have written the code below. What can I do to resolve this issue and find the optimal penalty parameters using K-fold validation for Ridge and Lasso regressions? Thank you.
from sklearn.model_selection import RepeatedKFold from numpy import arange cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42) kmodel = LassoCV(alphas=arange(0,1,.01), cv=cva,n_jobs=-1) model.fit(x_train,y_train) print(model.alpha_)
which then produces the error message of:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Advertisement
Answer
I set up the dataset with Boston as below:
from sklearn.model_selection import RepeatedKFold import numpy as np from sklearn.linear_model import LassoCV from sklearn.model_selection import train_test_split from sklearn.datasets import load_boston X, y = load_boston(return_X_y=True) x_train, x_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)
If I do cv lasso, i do not get the error you see, try not to run with alpha = 0, this is not lasso:
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42) model = LassoCV(alphas=np.arange(0.001,1,.01), cv=cva,n_jobs=-1) model.fit(x_train,y_train)
Then:
print(model.alpha_) 0.001