ROC vs Precision-recall

Post date: Jan 07, 2015 7:28:44 AM

This blog is about the demystification of the two most popular measures for detection performance in statistics: ROC (receiver operation curve) and precision-recall.

The difference between the two is in how they value negative examples (both true and false negative) [1]: ROC value true negative, while precision-recall does not. While ROC is easier to comprehend, precision-recall is especially useful for applications where true negative

is dominant but of limited/no interest, i.e. rare event applications. The precise mathematical definitions used by the two curves are given as follows.

ROC

True detection (true positive) = P(D = 1|H = 1)

False alarm (false positive) = P(D = 1| H = 0)

Precision-Recall

Recall = P(D = 1| H = 1) (thus same as true detection)

Precision = P(H = 1| D = 1)

where D is the decision variable and H is the true hypothesis [2]. For more lengthy argument, see [3].

[1] https://www.kaggle.com/forums/f/15/kaggle-forum/t/7517/precision-recall-auc-vs-roc-auc-for-class-imbalance-problems

[2] https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves/7210#7210?newreg=86ae1da65a774568add2ce6fab23aca0

[3] https://www.biostat.wisc.edu/~page/rocpr.pdf