Model measure

Assuming there are 160 samples to predict it's condition, the following is the predicted condition vs the actual condition.

predicted condition

True positive = 20

False positive = 10

True negative = 100

False negative = 30

Precision = true positive / (true positive + false positive). Of the samples predicted to be positive, the percentage of them that are actually positive.

Recall = true positive / (true positive + false negative). Of the samples that are actually positive, the percentage of them that are predicted to be positive.

When adjusting parameters (or threshold for something) in the model to improve the precision, the recall will drop, because when applying stricter rules on selecting the positive samples, the chance that a positive sample falls out is also larger.

Normally it needs to tweak model parameters (or thresholds) to choose a trade-off between precision and recall depending on applications.

For example, airport security check may prefer higher recall to ensure potential security threats are picked up as much as possible, even the precision is low, which end up many passengers queuing for further security check. When doing marketing campaigns, it may prefer high precision even the recall could be low, to avoid spending money on too many customers that are not interested.

The other measures used by statistician more often are:

Sensitivity = true positive / (true positive + false negative) which is actually the Recall.

Specificity = true negative / (true negative + false positive). Of the sample that are actually negative, the percentage of them that are predicted to be negative.

When applying a more loose rule (or threshold) so as to pick up more positive samples, it also brings in more false positives so the specificity will drop.

The curve is similar to the precision-recall curve above.

However, smart statisticians draw a curve of sensitivity vs (1 - specificity) so both axis can grow. The red curve below is called the Receiver Operating Characteristic (ROC) curve.

Again, people need to choose a trade-off somewhere to achieve a sensitivity while the specificity is not too low, depending on actual applications.

The blue line is the base line that you cant do worse than that, which is yielded by random guessing.

The sensitivity vs specificity misses a very important measure Precision. Look at the following example:

predicted condition

The sensitivity (recall) is 100% which is good.

The specificity is >90% which is also good.

But the precision is very low 100 / (100 + 1000), which could waste people a lot of money and effort to work on the 1000 false positives.

So when the data is very in-balanced, i.e. the ratio of true vs false is far from 1, the ROC might not be a good enough indicator.

The precision-recall method works better for focusing on positive samples.

The Area Under ROC curve is usually a good indicator too especially for ranking problems.

Page updated

Google Sites

Report abuse