I’m trying to finds the best estimator using GridSearchCV and I’m using refit = True as per default. Given that the documentation states:
The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance
Should I do .fit
on the training data afterwards as such:
classifier = GridSearchCV(estimator=model,param_grid = parameter_grid['param_grid'], scoring='balanced_accuracy', cv = 5, verbose=3, n_jobs=4,return_train_score=True, refit=True) classifier.fit(x_training, y_train_encoded_local) predictions = classifier.predict(x_testing) balanced_error = balanced_accuracy_score(y_true=y_test_encoded_local,y_pred=predictions)
Or should I do it like this instead:
classifier = GridSearchCV(estimator=model,param_grid = parameter_grid['param_grid'], scoring='balanced_accuracy', cv = 5, verbose=3, n_jobs=4,return_train_score=True, refit=True) predictions = classifier.predict(x_testing) balanced_error = balanced_accuracy_score(y_true=y_test_encoded_local,y_pred=predictions)
Advertisement
Answer
You should do it like your first verison. You need to always call classifier.fit
otherwise it doesn’t do anything. Refit=True
means that it trains on the entire training set after the cross validation is done.