Skip to content
Advertisement

Make available .best_params_ after pipeline

How to go about making available the clf.best_params_ after carrying a pipeline? For the code I have below, I get an:

AttributeError: 'GridSearchCV' object has no attribute 'best_params_

Here is my code:

from sklearn.datasets import make_classification
import numpy as np
from sklearn import metrics
from sklearn.metrics import accuracy_score

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

f, (ax1,ax2) = plt.subplots(nrows=1, ncols=2,figsize=(20,8))
# Generate noisy Data
num_trainsamples = 500
num_testsamples = 50
X_train,y_train = make_classification(n_samples=num_trainsamples, 
                          n_features=240, 
                          n_informative=9, 
                          n_redundant=0, 
                          n_repeated=0, 
                          n_classes=10, 
                          n_clusters_per_class=1,
                          class_sep=9,
                          flip_y=0.2,
                          #weights=[0.5,0.5], 
                          random_state=17)

X_test,y_test = make_classification(n_samples=50, 
                          n_features=num_testsamples, 
                          n_informative=9, 
                          n_redundant=0, 
                          n_repeated=0, 
                          n_classes=10, 
                          n_clusters_per_class=1,
                          class_sep=10,
                          flip_y=0.2,
                          #weights=[0.5,0.5], 
                          random_state=17)

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([('scaler', StandardScaler()),
                 ('pca', PCA(n_components=0.95)),
                 ('clf', RandomForestClassifier())])

# Declare a hyperparameter grid
parameter_space = {
    'clf__n_estimators': [10,50,100],
    'clf__criterion': ['gini', 'entropy'],
    'clf__max_depth': np.linspace(10,50,11),
}

clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model


pipe.fit(X_train,y_train)
print(f'Best Parameters: {clf.best_params_}')

Advertisement

Answer

Your clf is never fitted. You probably meant clf.fit(X_train,y_train).

Also, np.linspace(10,50,11) yields floats, while max_depth expects ints, so this may fail and you should probably add a type cast there (like np.linspace(10,50,11).astype('int')) or use something like arange() instead.

You should likely also fix your test set, which currently has no relation with the train one. Last but not least, PCA is not guaranteed to be useful for classification (see e.g. https://www.csd.uwo.ca/~oveksler/Courses/CS434a_541a/Lecture8.pdf).

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement