`sklearn` asking for eval dataset when there is on…

I am working on Stacking Regressor from sklearn and I used lightgbm to train my model. My lightgbm model has an early stopping option and I have used eval dataset and metric for this.

When it feeds into the StackingRegressor, I saw this error

ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

Which is frustrating because I do have them in my code. I wonder what is happening? Here’s my code.

import numpy as np 
import pandas as pd 

import lightgbm as lgb
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
import xgboost as xgb
from sklearn.ensemble import StackingRegressor

opt_parameters_LGBM = {'bagging_fraction': 0.37031434827212084, 'bagging_seed': 47, 'boosting_type': 'gbdt', 
                       'feature_fraction': 0.3894822966866982, 'learning_rate': 0.01, 'max_bin': 177, 'max_depth': -1, 
                       'metric': 'rmse', 'min_child_weight': 1000.0, 'num_leaves': 161, 'objective': 'regression', 
                       'random_state': 47, 'reg_alpha': 10, 'reg_lambda': 50, 'verbosity': -1}  
m1 = lgb.LGBMRegressor(valid_sets = [lgb_train, lgb_eval], verbose_eval = 30, num_boost_round = 10000, early_stopping_rounds = 10, n_jobs=4, n_estimators=3000, **opt_parameters_LGBM)
m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse')

opt_parameters_ADA = {'learning_rate': 0.03, 'n_estimators': 5} 
m2 = AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=3, min_samples_leaf=1, min_impurity_decrease=10, random_state=47), random_state=47, **opt_parameters_ADA)
m2.fit(X_train_df, y_train_df)

'''
Where problem starts
'''

gbm = xgb.XGBRegressor(
 learning_rate = 0.02,
 n_estimators= 5,
 max_depth= 4,
 min_child_weight= 2,
 gamma=0.9,                        
 subsample=0.8,
 colsample_bytree=0.8,
 objective= 'reg:squaredlogerror',
 nthread= -1,
 verbosity=3,
 random_state=20)

estimators = [('lgbm', m1), ('ada', m2)]

gbm = StackingRegressor(estimators=estimators, final_estimator=gbm, cv=5, verbose=1)
gbm.fit(X_train_df, y_train_df)

Answer

I guess the issue is causing by the fact that early_stopping was used in the LGBMRegressor, thus it expects eval data in StackingRegressor() as well.

Try doing the following:

Just after the line you’ve fitted your LGBMRegressor() model with the following line – m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse'), add these lines after that.

params = m1.get_params()

# remove early_stopping_rounds as your model is already fitted the data
params["early_stopping_rounds"] = None
m1.set_params(**params)

see if the error goes away.

`sklearn` asking for eval dataset when there is one

Advertisement

Answer

Try doing the following: