From PyTorch documentation:
b = torch.rand(10, requires_grad=True).cuda() b.is_leaf False # b was created by the operation that cast a cpu Tensor into a cuda Tensor e = torch.rand(10).cuda().requires_grad_() e.is_leaf True # e requires gradients and has no operations creating it f = torch.rand(10, requires_grad=True, device="cuda") f.is_leaf True # f requires grad, has no operation creating it
But why are e
and f
leaf Tensors, when they both were also cast from a CPU Tensor, into a Cuda Tensor (an operation)?
Is it because Tensor e
was cast into Cuda before the in-place operation requires_grad_()
?
And because f
was cast by assignment device="cuda"
rather than by method .cuda()
?
Advertisement
Answer
When a tensor is first created, it becomes a leaf node.
Basically, all inputs and weights of a neural network are leaf nodes of the computational graph.
When any operation is performed on a tensor, it is not a leaf node anymore.
b = torch.rand(10, requires_grad=True) # create a leaf node b.is_leaf # True b = b.cuda() # perform a casting operation b.is_leaf # False
requires_grad_()
is not an operation in the same way as cuda()
or others are.
It creates a new tensor, because tensor which requires gradient (trainable weight) cannot depend on anything else.
e = torch.rand(10) # create a leaf node e.is_leaf # True e = e.cuda() # perform a casting operation e.is_leaf # False e = e.requires_grad_() # this creates a NEW tensor e.is_leaf # True
Also, detach()
operation creates a new tensor which does not require gradient:
b = torch.rand(10, requires_grad=True) b.is_leaf # True b = b.detach() b.is_leaf # True
In the last example we create a new tensor which is already on a cuda device.
We do not need any operation to cast it.
f = torch.rand(10, requires_grad=True, device="cuda") # create a leaf node on cuda