Skip to content
Advertisement

How to build a full trainset when loading data from predefined folds in Surprise?

I am using Surprise to evaluate various recommender system algorithms. I would like to calculate predictions and prediction coverage on all possible user and item permutations. My data is loaded in from predefined splits.

My strategy to calculate prediction coverage is to

  1. build a full trainset and fit
  2. get lists of all users and items
  3. iterate through the list and make predictions
  4. count exceptions where predictions are impossible to calculate prediction coverage.

Trying to call data.build_full_trainset()) yields the following error:

JavaScript

Is there a way to build a full trainset when loading data from predefined folds?

Alternatively, I will attempt to combine the data externally from Surprise into a dataframe and redo the process. Or are there better approaches?

Thank you.

JavaScript

Advertisement

Answer

TLDR; The model_selection documentation in Surprise indicates a “refit” method, that will fit data on a full trainset, however it explicitly doesn’t work with predefined folds.

Another major issue: oyyablokov’s comment on this issue suggests you cannot fit a model with data that has NaNs. So even if you have a full trainset, how does one create a full prediction matrix to calculate things like prediction coverage, which requires all users and item combinations with or without ratings?

My workaround was to create 3 Surprise datasets.

  1. The dataset from predefined folds to compute best_params
  2. The full dataset of ratings (combining all folds outside of Surprise)
  3. The full prediction matrix dataset including all possible combinations of users and items (with or without ratings).

After you find your best paramaters with grid search cross validation, you can find your predictions and coverage with something like this:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement