Accuracy, Precision, and Recall for n-class Classification

This is supposed to be a concise tutorial and you should not spend longer than 10 minutes to go through it.

What is the accuracy?

When we have a multi-class classifier, we may want to evaluate its performance. In this tutorial, we will take a look into approaches to evaluating a multi-class classifier.

In a simple way, ones can use the accuracy which is defined as "the proportion of correctly predicted labels over all predictions". Mathematically, we write as follows:

Accuracy := (TP + TN) / (TP + FN + FP + TN) where TP stands for true positives, TN stands for true negatives, FN stands for false negatives, and FP stands for false positives.
Observe that TP + TN captures the condition of correctly predicted labels.

Unfortunately, this measurement may be sometimes misleading as there can be a case that a model has 'high' accuracy with the model predicting 'not so important' classes. In such cases, it could be interesting if we can analyze an individual performance of each class label e.g. by computing the precision and recall for each class label.

What are the precision and the recall for binary classification?

The precision is defined as "given all the predicted labels (for a given class X), how many instances are correctly predicted?". Mathematically, we write as follows:

Precision := TP / (TP + FP), where TP and FP are as defined above.
Observe that TP + FP captures all the predicted labels.

On the other hand, the recall is defined as "for all instances which must be labelled as X, how many of them are correctly labeled?". Mathematically, we write as follows:

Recall := TP / (TP + FN), where TP and FN are as defined above.
Observe that TP + FN captures all the instances which must be labelled as X.

What are the precision and the recall for multi-class classification?

While it is very straightforward to compute both precision and recall in binary classification problems, ones may confuse how to deal with these in multi-class classification problems? Indeed, we can basically adopt the same idea as if we are dealing with the binary classification. We illustrate these in the the following example.

To calculate the precision for label A, we use the definition of precision as defined above i.e. "given all the predicted labels (for a given class X), how many instances are correctly predicted?". Hence,

Precision for label A = TP of label A / TotalPredicted_A = 30 / 60 = 0.5

To calculate the recall for label A, we use the definition of recall as defined above i.e. "for all instances which must be labelled as X, how many of them are correctly labeled?"Hence,

Recall for label A = TP of label A / TotalGoldLabel_A = 30 / 100 = 0.3

Following the same steps, we have:

Precision for label B = 60 / 120 = 0.5
Recall for label B = 60 / 100 = 0.6

What do these values indicate? Indeed, precision = 0.5 for label A shows that, out of the times it predicts label A, 50% of the predictions are correct. On the other hand, recall = 0.3 for label A means that, out of the times it label A is predicted, 30% of the gold labels are obtained.

Accuracy, Precision, and Recall for n-class Classification

What is the accuracy?

What are the precision and the recall for binary classification?

What are the precision and the recall for multi-class classification?

References

see also >> LinkedIn & ResearchGate & Google Scholar & DBLP & researchmap