Trying to use GridSearchCV returning error: Check the list of available parameters with `estimator.get_params().keys()`

Question

I&#8217;m trying to build a Voting Ensemble model, with a data transformation pipeline. I still need to put the transformation of the response variable into the pipeline. I&#8217;m trying to use GridSearchCV to evaluate the best parameters for each algorithm, but when I try to run the last code block, I get a…

Accepted Answer

Always, please post the stack trace of the error for people to understandThere are multiple mistakes in your code,You are creating Pipeline Object using varible_transformer, where are you fitting it?What is X and y?Solution:Separate X-> input features needed for training and y-> the output variable values which the model has to learn.Create pipeline object, it is a wrapper that does the preprocessing for you, so fit it first before giving the input features to model.After fitting the pipeline object, you give the resultant numpy array to the classifier as the X and corresponding y to fit the model/classifier.I am showing an example of regressors with the data that I had handy.# median_house_value is what I am trying to estimate.# input_features = ['longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income']df = pd.read_csv("/filepath/california_housing_train.csv")X = df.drop("median_house_value", axis=1)y = df["median_house_value"]categorical = []numerical = ['longitude', 'latitude', 'housing_median_age', 'total_rooms',         'total_bedrooms', 'population', 'households',         'median_income']numeric_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),('scaler', StandardScaler())])categorical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='most_frequent')),('onehot', OneHotEncoder(handle_unknown='ignore'))])variable_transformer = ColumnTransformer(transformers=[    ('numeric', numeric_transformer, numerical),    ('categorical', categorical_transformer, categorical)],     remainder='passthrough')regressors = [XGBRegressor(),RandomForestRegressor()]xgbregressor_parameters = { 'regressor__grow_policy' : ['lossguide', 'deptwise'],'regressor__objective' : ['reg:squarederror'],   'regressor__colsample_bytree' : [0.7, 0.8, 0.9, 1.0],'regressor__max_leaves' : [0, 7]} randomforest_parameters = {'n_estimators': [200, 500],'max_features': ['auto', 'sqrt', 'log2'],'max_depth': [4, 5, 6, 7, 8]}parameters = [xgbregressor_parameters, randomforest_parameters]estimators = []pipe = Pipeline(steps=[('transformer', variable_transformer)])# fit the pipeline with input features for preprocessingprepared_data = pipe.fit_transform(X)# iterate through each regressor and use GridSearchCVfor i, regressor in enumerate(regressors):       clf = GridSearchCV(regressor,                   param_grid=parameters[i],                   scoring=['neg_mean_squared_error',                             'r2',                             'explained_variance',                             ],                    refit='neg_mean_squared_error',                   cv=2)clf.fit(prepared_data, y)print("Tuned Hyperparameters :", clf.best_params_)# add the clf to the estimators list    estimators.append((regressor.__class__.__name__, clf))# Output: Tuned Hyperparameters : {'regressor__colsample_bytree': 0.7, 'regressor__grow_policy': 'lossguide', 'regressor__max_leaves': 0, 'regressor__objective': 'reg:squarederror'}Tuned Hyperparameters : {'max_depth': 4, 'max_features': 'sqrt', 'n_estimators': 200}Note: Delete the classifier tag prepended to the parameter names for RandomForestclassifier in your case.

Advertisement

Answer