Skip to content
Advertisement

What if the validation step does not fit into numbers of samples?

It’s a bit annoying that tf.keras generator still faces this issue, unlike pytorch. There are many discussions regarding this, however, still stuck with it. Already visit:

Problem

I have a data set consist of around 21397. I wrote a custom data loader which returns the total number of samples as follows:

JavaScript

From the data, I’ve made 5 fold subset of it. Each fold contains as follows:

JavaScript

For each fold, I’ve set step_per_epoch and validation_per_epoch as follows:

JavaScript

Now, to make an OOF score, we predict on the validation set and wanted to store results as follows:

JavaScript

After training, at indexing time (oof), it throws a size mismatch of shape between 4280 and 4288. So, it looks like, with this step size and batch size, the model is predicting 8 samples of the next batch. Next, we set batch_size equal to 40 which dividable by the total number of the subset (4280). Good enough but (of course) faced again size mismatch in Fold 2 of shape between 4279 and 4280. One of the simple workarounds is to add 3 samples in fold 2,3,4 -_-

Any general tips to get rid of it? Thanks.

Advertisement

Answer

Did not have time to go through all your code however I thought the code below might be useful to you. The variable-length should be set to the number of samples. Then the code determines a batch size and steps per epoch such that length = batch_size*steps per epoch. Variable b_max should be set to the maximum batch size you will allow based on memory capacity. Note if the length is a prime number batch size will end up as 1 and steps will end up as length.

JavaScript

I use this to set validation steps so during validation the samples in the validation set are processed exactly once. An example is shown below.

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement