pytorch custom loss function nn.CrossEntropyLoss

Question

After studying autograd, I tried to make loss function myself. And here are my loss and I compared with torch.nn.CrossEntropyLoss here are results values were same. I thought, because those are different functions so grad_fn are different and it won't cause any problems. But something happened! After 4 epochs, loss values are turned to nan. Contrary to myCEE, with nn.CrossEntropyLoss

Accepted Answer

torch.nn.CrossEntropyLoss is different to your implementation because it uses a trick to counter instable computation of the exponential when using numerically big values. Given the logits output {l_1, ... l_j, ..., l_n}, the softmax is defined as:softmax(l_i) = exp(l_i) / sum_j(exp(l_j))The trick is to multiple both the numerator and denominator by exp(-β):softmax(l_i) = exp(l_i)*exp(-β) / [sum_j(exp(l_j))*exp(-β)]             = exp(l_i-β) / sum_j(exp(l_j-β))Then the log-softmax comes down to:logsoftmax(l_i) = l_i - β - log[sum_j(exp(l_j-β))]In practice β is chosen as the highest logit value i.e. β = max_j(l_j).You can read more about it on this question: Numerically Stable Softmax.

Advertisement

Answer