I am trying to understand why one or two parameters in my Pytorch neural network occasionally become nan after calling optimizer.step().
I have already checked the gradients after calling .backward() and just before calling the optimizer, and they neither contain nans nor are very large. I am doing gradient clipping, but I don’t think that this can be responsible since the gradients still look fine after clipping. I am using single-precision floats everywhere.
This behavior happens randomly every hundred thousand epochs or so, and is proving very difficult to debug. Unfortunately the code is too long to reproduce here and I haven’t been able to replicate the problem in a smaller example.
If anyone can suggest possible issues I haven’t mentioned above, that would be super helpful.
Thanks!
Advertisement
Answer
This ended up being ignorance on my part: there were Infs in the gradients that were evading my diagnostic code, as I didn’t realize Pytorch’s .isnan() method doesn’t detect them.