Including Scaling and PCA as parameter of GridSearchCV

Question

I want to run a logistic regression using GridSearchCV, but I want to contrast the performance when Scaling and PCA is used, so I don&#8217;t want to use it in all cases. I basically would like to include PCA and Scaling as &#8220;parameters&#8221; of the GridSearchCV I am aware I can make a pipeline like thi…

Accepted Answer

You can set up the parameters with_mean and with_std of StandardScaler() as False to represent no standerdization. In the GirdSearchCV, the parameter para_grid can be set up asparam_grid = [{'scale__with_mean': [False],               'scale__with_std': [False],               'mnl__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],               'mnl__max_iter':[500,1000,2000,3000]              },              {'mnl__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],              'mnl__max_iter':[500,1000,2000,3000]}]Then the first dict in the list is &#8220;No Scaler+mnl&#8221; and the second is &#8220;Scaler+mnl&#8221;Ref:https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.htmlhttps://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.htmlEdit:I think it&#8217;s complicated if you also considering turn on/off PCA&#8230; Maybe you need to define a customised PCA which derives the original PCA. And then define additional boolean argument which determines whether the PCA should be executed or not&#8230;class MYPCA(PCA):    def __init__(self, PCA_turn_on, *args):        super().__init__(*args)        self.PCA_turn_on = PCA_turn_on        def fit(X, y=None):        if (PCA_turn_on == True):            return super().fit(X, y=None)        else:            pass    # same for other methods defined in PCA

Advertisement

Answer