Is preprocessing repeated in a Pipeline each time a new ML model is loaded?

Question

I have created a pipeline using sklearn so that multiple models will go through it. Since there is vectorization before fitting the model, I wonder if this vectorization is performed always before the model fitting process? If yes, maybe I should take this preprocessing out of the pipeline. Answer When you are running a GridSearchCV, pipeline steps will be recomputed

Accepted Answer

When you are running a GridSearchCV, pipeline steps will be recomputed for every combination of hyperparameters. So yes, this vectorization process will be done every time the pipeline is called.Have a look at the sklearn Pipeline and composite estimators.To quote:Fitting transformers may be computationally expensive. With its memoryparameter set, Pipeline will cache each transformer after calling fit.This feature is used to avoid computing the fit transformers within apipeline if the parameters and input data are identical. A typicalexample is the case of a grid search in which the transformers can befitted only once and reused for each configuration.So you can use the memory flag to cache the transformers.cachedir = mkdtemp()pipe = Pipeline(estimators, memory=cachedir)

Advertisement

Answer