Skip to content
Advertisement

GridSearchCV.best_score not same as cross_val_score(GridSearchCV.best_estimator_)

Consider the following gridsearch :
grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5)
grid_fit = grid.fit(X_train1, y_train1)

According to Sklearn’s ressource, grid_fit.best_score_ returns The mean cross-validated score of the best_estimator .

To me that would mean that the average of :

cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5)

should be exactly the same as:

grid_fit.best_score_.

However I am getting a 10% difference between the two numbers. What am I missing ?

I am using the gridsearch on proprietary data so I am hoping somebody has run into something similar in the past and can guide me without a fully reproducible example. I will try to reproduce this with the Iris dataset if it’s not clear enough…

Advertisement

Answer

when an integer number is passed to GridSearchCV(..., cv=int_number) parameter, then the StratifiedKFold will be used for cross-validation splitting. So the data set will be randomly splitted by StratifiedKFold. This might affect the accuracy and therefore the best score.

Advertisement