Suppose a probability distribution [p1, p2, …] whose sum equals one and each class probability prediction is {\hat p_i }, the cross entropy is calculated as
For a one hot probability distribution, i.e. only one item in the ground truth probability vector equals to one while others zero, the categorical cross entropy can by simplified as
For a binary classification problem, the ground truth label y either equals to 1 or 0, which is equivalent to the distribution vector of [y,1-y]. The prediction {\hat y} is equivalent to the the distribution vector of [{\hat y}, 1-{\hat y}]. Thus the binary cross entropy can be calculated as