Skip to content
Advertisement

Scaler fitted in a pipeline turns out to be not fitted yet

Please consider this code:

import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline

# data
train_X = pd.DataFrame(data=np.random.rand(20, 3), columns=["a", "b", "c"])
train_y = pd.Series(data=np.random.randint(0,2, 20), name="y")
test_X = pd.DataFrame(data=np.random.rand(10, 3), columns=["a", "b", "c"])
test_y = pd.Series(data=np.random.randint(0,2, 10), name="y")

# scaler
scaler = StandardScaler()

# feature selection        
p = Pipeline(steps=[("scaler0",  scaler),
            ("model", SVC(kernel="linear", C=1))])

rfe = RFE(p, n_features_to_select=2, step=1,
                  importance_getter="named_steps.model.coef_")
rfe.fit(train_X, train_y)

# apply the scaler to the test set
scaled_test = scaler.transform(test_X)

I get this message:

NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Why is the scaler not fitted?

Advertisement

Answer

When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features.

To access this fit estimator you can use fit_pipeline = rfe.estimator_

But note, this new pipeline uses the top n_features_to_select features.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement