nan loss / gradient

NAN Loss

reasons that a model diverge

1. learning rate is too high. The loss often begins to increase and then diverges to infinity.

2. Log loss (e.g negative log likelihood). The log becomes -infinity when likelihood tends to zero. Add an epsilon will solve the problem. 

3. Numerical stability issues, e.g. square root derivative can diverge if not properly simplified when dealing with finite precision numbers. 

4. Issues with the input data. Check np.isnan(x).any() & np.isinf(x).any() on the input data. 

5. Issues with the target values. Make sure they are valid labels.

6. Make sure the data is properly normalized.

7. Faulty loss function