Please consider this code:
import pandas as pd import numpy as np from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.feature_selection import RFE from sklearn.pipeline import Pipeline # data train_X = pd.DataFrame(data=np.random.rand(20, 3), columns=["a", "b", "c"]) train_y = pd.Series(data=np.random.randint(0,2, 20), name="y") test_X = pd.DataFrame(data=np.random.rand(10, 3), columns=["a", "b", "c"]) test_y = pd.Series(data=np.random.randint(0,2, 10), name="y") # scaler scaler = StandardScaler() # feature selection p = Pipeline(steps=[("scaler0", scaler), ("model", SVC(kernel="linear", C=1))]) rfe = RFE(p, n_features_to_select=2, step=1, importance_getter="named_steps.model.coef_") rfe.fit(train_X, train_y) # apply the scaler to the test set scaled_test = scaler.transform(test_X)
I get this message:
NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Why is the scaler
not fitted?
Advertisement
Answer
When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features.
To access this fit estimator you can use
fit_pipeline = rfe.estimator_
But note, this new pipeline uses the top n_features_to_select
features.