Skip to content
Advertisement

Data Augmentation in PyTorch

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping…etc). But that doesn’t seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms in PyTorch, then it applies them one by one. So for example:

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Here , for the training, we are first randomly cropping the image and resizing it to shape (224,224). Then we are taking these (224,224) images and horizontally flipping them. Therefore, our dataset is now containing ONLY the horizontally flipped images, so our original images are lost in this case.

Am I right? Is this understanding correct? If not, then where do we tell PyTorch in this code above (taken from Official Documentation) to keep the original images and resize them to the expected shape (224,224)?

Thanks

Advertisement

Answer

The transforms operations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.

The confusion may come from the fact that often, like in your example, transforms are used both for data preparation (resizing/cropping to expected dimensions, normalizing values, etc.) and for data augmentation (randomizing the resizing/cropping, randomly flipping the images, etc.).


What your data_transforms['train'] does is:

  • Randomly resize the provided image and randomly crop it to obtain a (224, 224) patch
  • Apply or not a random horizontal flip to this patch, with a 50/50 chance
  • Convert it to a Tensor
  • Normalize the resulting Tensor, given the mean and deviation values you provided

What your data_transforms['val'] does is:

  • Resize your image to (256, 256)
  • Center crop the resized image to obtain a (224, 224) patch
  • Convert it to a Tensor
  • Normalize the resulting Tensor, given the mean and deviation values you provided

(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)


If you don’t want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip() line.

Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCrop by transforms.Resize and transforms.CenterCrop, as done for data_transforms['val'].

Advertisement