I am trying to replicate a GAN study (Stargan-V2). So, I want to train a model (using less data) in Google Colab. But, I got this problem:
Start training... /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Traceback (most recent call last): File "main.py", line 182, in <module> main(args) File "main.py", line 59, in main solver.train(loaders) File "/content/drive/My Drive/stargan-v2/core/solver.py", line 131, in train nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks) File "/content/drive/My Drive/stargan-v2/core/solver.py", line 259, in compute_g_loss x_rec = nets.generator(x_fake, s_org, masks=masks) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/stargan-v2/core/model.py", line 181, in forward x = block(x, s) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/stargan-v2/core/model.py", line 117, in forward out = self._residual(x, s) File "/content/drive/My Drive/stargan-v2/core/model.py", line 109, in _residual x = F.interpolate(x, scale_factor=2, mode='nearest') File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 3132, in interpolate return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 14.73 GiB already allocated; 195.88 MiB free; 14.89 GiB reserved in total by PyTorch)
I changed batch_size
but It didn’t work for me. Did you have any idea? How can I fix this problem?
Paper: StarGAN v2: Diverse Image Synthesis for Multiple Domains
Original github repo: stargan-v2
Advertisement
Answer
If you aren’t using the Pro version of Google Colab, then you’re going to run into somewhat restrictive maximums for your memory allocation. From the Google Colab FAQ…
The amount of memory available in Colab virtual machines varies over time (but is stable for the lifetime of the VM)… You may sometimes be automatically assigned a VM with extra memory when Colab detects that you are likely to need it. Users interested in having more memory available to them in Colab, and more reliably, may be interested in Colab Pro.
You already have a good grasp of this issue, since you understand that lowering batch_size
is a good way to get around it for a little while. Ultimately, though, if you want to replicate this study, you’ll have to switch to a training method that can accommodate for the amount of data you seem to need.