Please consider this code:
JavaScript
x
27
27
1
import pandas as pd
2
import numpy as np
3
from sklearn.svm import SVC
4
from sklearn.preprocessing import StandardScaler
5
from sklearn.feature_selection import RFE
6
from sklearn.pipeline import Pipeline
7
8
# data
9
train_X = pd.DataFrame(data=np.random.rand(20, 3), columns=["a", "b", "c"])
10
train_y = pd.Series(data=np.random.randint(0,2, 20), name="y")
11
test_X = pd.DataFrame(data=np.random.rand(10, 3), columns=["a", "b", "c"])
12
test_y = pd.Series(data=np.random.randint(0,2, 10), name="y")
13
14
# scaler
15
scaler = StandardScaler()
16
17
# feature selection
18
p = Pipeline(steps=[("scaler0", scaler),
19
("model", SVC(kernel="linear", C=1))])
20
21
rfe = RFE(p, n_features_to_select=2, step=1,
22
importance_getter="named_steps.model.coef_")
23
rfe.fit(train_X, train_y)
24
25
# apply the scaler to the test set
26
scaled_test = scaler.transform(test_X)
27
I get this message:
JavaScript
1
2
1
NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
2
Why is the scaler
not fitted?
Advertisement
Answer
When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features.
To access this fit estimator you can use
fit_pipeline = rfe.estimator_
But note, this new pipeline uses the top n_features_to_select
features.