I am working on Stacking Regressor from sklearn
and I used lightgbm
to train my model. My lightgbm
model has an early stopping option and I have used eval dataset and metric for this.
When it feeds into the StackingRegressor
, I saw this error
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
Which is frustrating because I do have them in my code. I wonder what is happening? Here’s my code.
import numpy as np import pandas as pd import lightgbm as lgb from sklearn.ensemble import AdaBoostRegressor from sklearn.tree import DecisionTreeRegressor import xgboost as xgb from sklearn.ensemble import StackingRegressor opt_parameters_LGBM = {'bagging_fraction': 0.37031434827212084, 'bagging_seed': 47, 'boosting_type': 'gbdt', 'feature_fraction': 0.3894822966866982, 'learning_rate': 0.01, 'max_bin': 177, 'max_depth': -1, 'metric': 'rmse', 'min_child_weight': 1000.0, 'num_leaves': 161, 'objective': 'regression', 'random_state': 47, 'reg_alpha': 10, 'reg_lambda': 50, 'verbosity': -1} m1 = lgb.LGBMRegressor(valid_sets = [lgb_train, lgb_eval], verbose_eval = 30, num_boost_round = 10000, early_stopping_rounds = 10, n_jobs=4, n_estimators=3000, **opt_parameters_LGBM) m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse') opt_parameters_ADA = {'learning_rate': 0.03, 'n_estimators': 5} m2 = AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=3, min_samples_leaf=1, min_impurity_decrease=10, random_state=47), random_state=47, **opt_parameters_ADA) m2.fit(X_train_df, y_train_df) ''' Where problem starts ''' gbm = xgb.XGBRegressor( learning_rate = 0.02, n_estimators= 5, max_depth= 4, min_child_weight= 2, gamma=0.9, subsample=0.8, colsample_bytree=0.8, objective= 'reg:squaredlogerror', nthread= -1, verbosity=3, random_state=20) estimators = [('lgbm', m1), ('ada', m2)] gbm = StackingRegressor(estimators=estimators, final_estimator=gbm, cv=5, verbose=1) gbm.fit(X_train_df, y_train_df)
Advertisement
Answer
I guess the issue is causing by the fact that early_stopping
was used in the LGBMRegressor
, thus it expects eval data in StackingRegressor()
as well.
Try doing the following:
Just after the line you’ve fitted your LGBMRegressor()
model with the following line – m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse')
, add these lines after that.
params = m1.get_params() # remove early_stopping_rounds as your model is already fitted the data params["early_stopping_rounds"] = None m1.set_params(**params)
see if the error goes away.