What are the main reasons why some network parameters might become nan after calling optimizer.step in Pytorch?

Question

I am trying to understand why one or two parameters in my Pytorch neural network occasionally become nan after calling optimizer.step(). I have already checked the gradients after calling .backward() and just before calling the optimizer, and they neither contain nans nor are very large. I am doing gradient c…

Accepted Answer

This ended up being ignorance on my part: there were Infs in the gradients that were evading my diagnostic code, as I didn&#8217;t realize Pytorch&#8217;s .isnan() method doesn&#8217;t detect them.

Advertisement

Answer