How to go about making available the clf.best_params_
after carrying a pipeline
? For the code I have below, I get an:
AttributeError: 'GridSearchCV' object has no attribute 'best_params_
‘
Here is my code:
JavaScript
x
58
58
1
from sklearn.datasets import make_classification
2
import numpy as np
3
from sklearn import metrics
4
from sklearn.metrics import accuracy_score
5
6
from sklearn.ensemble import RandomForestClassifier
7
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
8
9
f, (ax1,ax2) = plt.subplots(nrows=1, ncols=2,figsize=(20,8))
10
# Generate noisy Data
11
num_trainsamples = 500
12
num_testsamples = 50
13
X_train,y_train = make_classification(n_samples=num_trainsamples,
14
n_features=240,
15
n_informative=9,
16
n_redundant=0,
17
n_repeated=0,
18
n_classes=10,
19
n_clusters_per_class=1,
20
class_sep=9,
21
flip_y=0.2,
22
#weights=[0.5,0.5],
23
random_state=17)
24
25
X_test,y_test = make_classification(n_samples=50,
26
n_features=num_testsamples,
27
n_informative=9,
28
n_redundant=0,
29
n_repeated=0,
30
n_classes=10,
31
n_clusters_per_class=1,
32
class_sep=10,
33
flip_y=0.2,
34
#weights=[0.5,0.5],
35
random_state=17)
36
37
from sklearn.pipeline import Pipeline
38
from sklearn.decomposition import PCA
39
from sklearn.preprocessing import StandardScaler
40
from sklearn.ensemble import RandomForestClassifier
41
42
pipe = Pipeline([('scaler', StandardScaler()),
43
('pca', PCA(n_components=0.95)),
44
('clf', RandomForestClassifier())])
45
46
# Declare a hyperparameter grid
47
parameter_space = {
48
'clf__n_estimators': [10,50,100],
49
'clf__criterion': ['gini', 'entropy'],
50
'clf__max_depth': np.linspace(10,50,11),
51
}
52
53
clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model
54
55
56
pipe.fit(X_train,y_train)
57
print(f'Best Parameters: {clf.best_params_}')
58
Advertisement
Answer
Your clf
is never fitted. You probably meant clf.fit(X_train,y_train)
.
Also, np.linspace(10,50,11)
yields floats, while max_depth
expects ints, so this may fail and you should probably add a type cast there (like np.linspace(10,50,11).astype('int')
) or use something like arange()
instead.
You should likely also fix your test set, which currently has no relation with the train one. Last but not least, PCA is not guaranteed to be useful for classification (see e.g. https://www.csd.uwo.ca/~oveksler/Courses/CS434a_541a/Lecture8.pdf).