Skip to content

Tag: cross-validation

Gaussian Process Regression: tune hyperparameters based on validation set

In the standard scikit-learn implementation of Gaussian-Process Regression (GPR), the hyper-parameters (of the kernel) are chosen based on the training set. Is there an easy to use implementation of GPR (in python), where the hyperparemeters (of the kernel) are chosen based on a separate validation set? Or cross-validation would also be a nice alternative to find suitable hyperparameters (that are

In Leave One Out Cross Validation, How can I Use `shap.Explainer()` Function to Explain a Machine Learning Model?

Background of the Problem I want to explain the outcome of machine learning (ML) models using SHapley Additive exPlanations (SHAP) which is implemented in the shap library of Python. As a parameter of the function shap.Explainer(), I need to pass an ML model (e.g. XGBRegressor()). However, in each iteration of the Leave One Out Cross Validation (LOOCV), the ML model

Cross Validation with coco data format json files

I am a newbie ML learner and trying semantic image segmentation on google colab with COCO data format json and lots of images on google drive. update I borrowed this code as a starting point. So my code on colab is pretty much like this. /update I am splitting an exported json file into 2 jsons (train/validate with 80/20

GridSearchCV.best_score not same as cross_val_score(GridSearchCV.best_estimator_)

Consider the following gridsearch : grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5) grid_fit =, y_train1) According to Sklearn’s ressource, grid_fit.best_score_ returns The mean cross-validated score of the best_estimator . To me that would mean that the average of : cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5) should be exactly the same as: grid_fit.best_score_. However I am getting a 10% difference

Cross validation with grid search returns worse results than default

I’m using scikitlearn in Python to run some basic machine learning models. Using the built in GridSearchCV() function, I determined the “best” parameters for different techniques, yet many of these perform worse than the defaults. I include the default parameters as an option, so I’m surprised this would happen. For example: This is the same as the defaults, except max_depth
