ROC vs Precision-recall
Post date: Jan 07, 2015 7:28:44 AM
This blog is about the demystification of the two most popular measures for detection performance in statistics: ROC (receiver operation curve) and precision-recall.
The difference between the two is in how they value negative examples (both true and false negative) [1]: ROC value true negative, while precision-recall does not. While ROC is easier to comprehend, precision-recall is especially useful for applications where true negative
is dominant but of limited/no interest, i.e. rare event applications. The precise mathematical definitions used by the two curves are given as follows.
ROC
True detection (true positive) = P(D = 1|H = 1)
False alarm (false positive) = P(D = 1| H = 0)
Precision-Recall
Recall = P(D = 1| H = 1) (thus same as true detection)
Precision = P(H = 1| D = 1)
where D is the decision variable and H is the true hypothesis [2]. For more lengthy argument, see [3].
[1] https://www.kaggle.com/forums/f/15/kaggle-forum/t/7517/precision-recall-auc-vs-roc-auc-for-class-imbalance-problems
[2] https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves/7210#7210?newreg=86ae1da65a774568add2ce6fab23aca0
[3] https://www.biostat.wisc.edu/~page/rocpr.pdf