Beginners to machine learning are sometimes confused by cross entropy error. Cross entropy error is also called log loss. In the general case, cross entropy error is a measure of error between a set of predicted probabilities and a set of actual probabilities.

Cross entropy error is calculated as “the negative of the sum of the log of the predicteds times the associated actuals.” For example, if a set of predicted probabilities is (0.20, 0.70, 0.10) and the associated actual probabilities are (0.25, 0.45, 0.30) then CE error is:

CE = - [ log(0.20)*0.25 + log(0.70)*0.45 + log(0.10)*0.30 ] = 1.254

But in neural network classification, the actual probabilites are the encoded target class label, which has the form of one 1-value and the rest 0-values. So if predicted probabilities are as before: (0.20, 0.70, 0.10) and the class labels are (0, 1, 0) then the CE is:

CE = - [ log(0.20)*0 + log(0.70)*1 + log(0.10)*0 ] = - [ 0 + (-0.357) + 0 ] = 0.357

Because all of the actual probabilities except one are 0, all but one term drop out. This just doesn’t seem correct but it is.

And in the case of binary classification, the CE equation can be further simplified. Suppose the target probabilities are (1, 0) = (y, 1-y) and the predicted probabilities are (0.70, 0.30) = (y’, 1-y’). The CE error reduces to just -log(y’)*y.

Like many things on the road to machine learning mastery, this is something that seems surprising at first but quickly becomes wired-in knowledge.