I’m trying to change some PyTorch code so that it can run on the CPU.
The model was trained with torch.nn.DataParallel()
so when I load the pre-trained model and try using it I must use nn.DataParallel()
which I am currently doing like this:
device = torch.device("cuda:0") net = nn.DataParallel(net, device_ids=[0]) net.load_state_dict(torch.load(PATH)) net.to(device)
However after I switched my torch device to cpu like this:
device = torch.device('cpu') net = nn.DataParallel(net, device_ids=[0]) net.load_state_dict(torch.load(PATH)) net.to(device)
I got this error:
File "C:MyProgramwin-py362-venvlibsite-packagestorchnnparalleldata_parallel.py", line 156, in forward "them on device: {}".format(self.src_device_obj, t.device)) RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
I’m assuming that it’s still looking for CUDA because that’s what device_ids
is set to but is there a way to make it use the CPU? This post from the PyTorch repo makes me think that I can but it doesn’t explain how.
If not is there any other way to use a model trained with DataParallel on your CPU?
Advertisement
Answer
When you use torch.nn.DataParallel()
it implements data parallelism at the module level.
The parallelized module must have its parameters and buffers on device_ids[0] before running this DataParallel module.
So even though you are doing .to(torch.device('cpu'))
it is still expecting to pass the data to a GPU.
However since DataParallel
is a container you can bypass it and get just the original module by doing this:
net = net.module.to(device)
Now it will access the original module you defined before you applied the DataParallel
container.