negative log loss and cross entropy in pytorch

NLLLoss

negative log likelihood loss

Let say the output of a neural network is a C-size vector, representing C classes

Firstly it needs to compute / simulate a probability distribution over the C classes using softmax

i.e. exp(i)/sum(exp(j)) so each of the C elements is given a score between 0 and 1, and all elements scores add up to 1

When the score can be very small, so it can overflow a float variable, especially when you times many scores together.

So we apply log on the score. In pytorch, there is a LogSoftmax function that does both softmax and log.

The output of the neural network then can be transformed to a vector of log likelihood of size C.

Say the correct class is #i, so we hope to the model gives a large likelihood for the i-th element.

Denote the score at the i-th element of the output vector (prediction) is xi.

xi the bigger the better.

So we define loss as negative xi, (loss is the smaller the better)

for one prediction, loss = -xi

for one batch, loss = sum(-xi), or loss = mean(-xi), for all predictions.

CrossEntropyLoss combines LogSoftmax and NLLLoss in one single class.

So if you prefer not to have a logsoftmax layer within your model, use corssentopyloss instead

They are the same thing.