I’m working with the Boston housing dataset from sklearn.datasets
and have run ridge and lasso regressions on my data (post train/test split). I’m now trying to perform k-fold cross validation to find the optimal penalty parameters, and have written the code below. What can I do to resolve this issue and find the optimal penalty parameters using K-fold validation for Ridge and Lasso regressions? Thank you.
JavaScript
x
7
1
from sklearn.model_selection import RepeatedKFold
2
from numpy import arange
3
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
4
kmodel = LassoCV(alphas=arange(0,1,.01), cv=cva,n_jobs=-1)
5
model.fit(x_train,y_train)
6
print(model.alpha_)
7
which then produces the error message of:
JavaScript
1
2
1
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2
Advertisement
Answer
I set up the dataset with Boston as below:
JavaScript
1
10
10
1
from sklearn.model_selection import RepeatedKFold
2
import numpy as np
3
from sklearn.linear_model import LassoCV
4
from sklearn.model_selection import train_test_split
5
from sklearn.datasets import load_boston
6
7
X, y = load_boston(return_X_y=True)
8
x_train, x_test, y_train, y_test = train_test_split(
9
X, y, test_size=0.33, random_state=42)
10
If I do cv lasso, i do not get the error you see, try not to run with alpha = 0, this is not lasso:
JavaScript
1
4
1
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
2
model = LassoCV(alphas=np.arange(0.001,1,.01), cv=cva,n_jobs=-1)
3
model.fit(x_train,y_train)
4
Then:
JavaScript
1
3
1
print(model.alpha_)
2
0.001
3