Skip to content
Advertisement

manually computing cross entropy loss in pytorch

I am trying to compute cross_entropy loss manually in Pytorch for an encoder-decoder model.

I used the code posted here to compute it: Cross Entropy in PyTorch

I updated the code to discard padded tokens (-100). The final code is this:

JavaScript

To verify that it works fine, I tested it on a text generation task, and I computed the loss using pytorch.nn implementation and using this code.

The loss values are not identical:

using nn.CrossEntropyLoss:

enter image description here

Using the code from the link above:

enter image description here

Am I missing something?

I tried to get the source code of nn.CrossEntropyLoss but I wasn’t able. In this link nn/functional.py at line 2955, you will see that the function points to another cross_entropy loss called torch._C._nn.cross_entropy_loss; I can’t find this function in the repo.

Edit:

I noticed that the differences appear only when I have -100 tokens in the gold.

Demo example:

JavaScript

and when we don’t have -100:

JavaScript

Advertisement

Answer

I solved the problem by updating the code. I discarded before the -100 tokens (the if-statement above), but I forgot to reduce the hidden_state size (which is called n_batch in the code above). After doing that, the loss numbers are identical to the nn.CrossEntropyLoss values. The final code:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement