This is supposed to be a concise tutorial and you should not spend longer than 10 minutes to go through it.
When we have a multi-class classifier, we may want to evaluate its performance. In this tutorial, we will take a look into approaches to evaluating a multi-class classifier.
In a simple way, ones can use the accuracy which is defined as "the proportion of correctly predicted labels over all predictions". Mathematically, we write as follows:
Unfortunately, this measurement may be sometimes misleading as there can be a case that a model has 'high' accuracy with the model predicting 'not so important' classes. In such cases, it could be interesting if we can analyze an individual performance of each class label e.g. by computing the precision and recall for each class label.
The precision is defined as "given all the predicted labels (for a given class X), how many instances are correctly predicted?". Mathematically, we write as follows:
On the other hand, the recall is defined as "for all instances which must be labelled as X, how many of them are correctly labeled?". Mathematically, we write as follows:
While it is very straightforward to compute both precision and recall in binary classification problems, ones may confuse how to deal with these in multi-class classification problems? Indeed, we can basically adopt the same idea as if we are dealing with the binary classification. We illustrate these in the the following example.
To calculate the precision for label A, we use the definition of precision as defined above i.e. "given all the predicted labels (for a given class X), how many instances are correctly predicted?". Hence,
To calculate the recall for label A, we use the definition of recall as defined above i.e. "for all instances which must be labelled as X, how many of them are correctly labeled?"Hence,
Following the same steps, we have:
What do these values indicate? Indeed, precision = 0.5 for label A shows that, out of the times it predicts label A, 50% of the predictions are correct. On the other hand, recall = 0.3 for label A means that, out of the times it label A is predicted, 30% of the gold labels are obtained.
Other blogged posts >> Tutorials