Skip to content
Advertisement

PyTorch DataLoader shuffle

I did an experiment and I did not get the result I was expecting.

For the first part, I am using

JavaScript

I save trainloader.dataset.targets to the variable a, and trainloader.dataset.data to the variable b before training my model. Then, I train the model using trainloader.
After the training is finished, I save trainloader.dataset.targets to the variable c, and trainloader.dataset.data to the variable d. Finally, I check a == c and b == d and they both give True, which was expected because the shuffle parameter of the DataLoader is False.

For the second part, I am using

JavaScript

I save trainloader.dataset.targets to the variable e, and trainloader.dataset.data to the variable f before training my model. Then, I train the model using trainloader. After the training is finished, I save trainloader.dataset.targets to the variable g, and trainloader.dataset.data to the variable h. I expect e == g and f == h to be both False since shuffle=True, but they give True again. What am I missing from the definition of DataLoader class?

Advertisement

Answer

I believe that the data that is stored directly in the trainloader.dataset.data or .target will not be shuffled, the data is only shuffled when the DataLoader is called as a generator or as iterator

You can check it by doing next(iter(trainloader)) a few times without shuffling and with shuffling and they should give different results

JavaScript

This will give :

JavaScript

However the data and label stored in data and target is a fixed list and since you are trying to access it directly, they will not be shuffled.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement