Skip to content
Advertisement

K-Fold cross validation for Lasso and Ridge models

I’m working with the Boston housing dataset from sklearn.datasets and have run ridge and lasso regressions on my data (post train/test split). I’m now trying to perform k-fold cross validation to find the optimal penalty parameters, and have written the code below. What can I do to resolve this issue and find the optimal penalty parameters using K-fold validation for Ridge and Lasso regressions? Thank you.

from sklearn.model_selection import RepeatedKFold
from numpy import arange
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
kmodel = LassoCV(alphas=arange(0,1,.01), cv=cva,n_jobs=-1)
model.fit(x_train,y_train)
print(model.alpha_)

which then produces the error message of:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Advertisement

Answer

I set up the dataset with Boston as below:

from sklearn.model_selection import RepeatedKFold
import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)

If I do cv lasso, i do not get the error you see, try not to run with alpha = 0, this is not lasso:

cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
model = LassoCV(alphas=np.arange(0.001,1,.01), cv=cva,n_jobs=-1)
model.fit(x_train,y_train)

Then:

print(model.alpha_)
0.001
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement