Skip to content
Advertisement

Is it possible to optimize hyperparameters for optional sklearn pipeline steps?

I tried to construct a pipeline that has some optional steps. However, I would like to optimize hyperparameters for those steps as I want to get the best option between not using them and using them with different configurations (in my case SelectFromModel – sfm).

JavaScript

The error that I get is ‘string’ object has no attribute ‘set_params’ which is understandable. Is there a way to specify which combinations should be tried together, in my case only ‘passthrough’ by itself and sfm with different hyperparameters?

Thanks!

Advertisement

Answer

As specified by @Robin, you might define p_grid_lr as a list of dictionaries. Indeed, here is what the docs of GridSearchCV states at this proposal:

param_grid: dict or list of dictionaries

Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

JavaScript

A less scalable alternative (for your case) might be the following

JavaScript

specifying all of the possible combinations for your parameters.

Moreover, be aware that to access parameters max_depth, n_estimators and max_features from the RandomForestRegressor estimator within SelectFromModel you should type parameters as

JavaScript

rather than as

JavaScript

because these parameters are from the estimator itself (max_features in principle might also be a parameter from SelectFromModel, but in such a case it may only attain integer values as from docs).

In general you can access all the parameters to be possibly optimized via pipeline.get_params().keys() (estimator.get_params().keys() in general).

Eventually, here’s a nice reading from the user guide for Pipelines.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement