Skip to content
Advertisement

Is preprocessing repeated in a Pipeline each time a new ML model is loaded?

I have created a pipeline using sklearn so that multiple models will go through it. Since there is vectorization before fitting the model, I wonder if this vectorization is performed always before the model fitting process? If yes, maybe I should take this preprocessing out of the pipeline.

JavaScript

Advertisement

Answer

When you are running a GridSearchCV, pipeline steps will be recomputed for every combination of hyperparameters. So yes, this vectorization process will be done every time the pipeline is called.

Have a look at the sklearn Pipeline and composite estimators.

To quote:

Fitting transformers may be computationally expensive. With its memory parameter set, Pipeline will cache each transformer after calling fit. This feature is used to avoid computing the fit transformers within a pipeline if the parameters and input data are identical. A typical example is the case of a grid search in which the transformers can be fitted only once and reused for each configuration.

So you can use the memory flag to cache the transformers.

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement