Binary Classification is when a test returns one of two qualitative results:

Yes/No

Positive/Negative

Present/Absent

With a perfectly accurate diagnostic test for some disease, ALL people with the disease would have positive test results (TRUE POSITIVE rate or SENSITIVITY of 100%;) and ALL people without disease would have negative tests (TRUE NEGATIVE rate or SPECIFICITY: 100%). A perfect test makes no errors and this perfect accuracy is independent of PREVALENCE (the proportion of people with disease).

Real-world binary tests are imperfect and make errors of two types: False Alarms and Missed Alarms. If a Fire Alarm goes off without a fire present, that is a False Alarm, or a FALSE POSITIVE test result. If a Fire Alarm fails to detect a real fire, then that is a MISSED alarm or a FALSE NEGATIVE test result.

Imagine 2000 patients being tested for a disease. A Contingency Table for a binary test is presented in graphic form below. The 4 quadrants show the 4 possible outcomes of testing. The Left side shows people who actually have the DISEASE. The Right side shows shows HEALTHY people. The top row shows people who have POSITIVE tests and the bottom row shows those with NEGATIVE tests. The size of the colored areas representing the 4 quadrants is proportional to the number of patients in the respective classification groups:

TP: True Positive, FP: False Positive

FN: False Negative, TN: True Negative.

The starting condition shown in th graphic belos is a test with 75% sensitivity and 90% specificity. The Sensitivity and Specificity can be varied by clicking and dragging with your mouse. Likewise, the PREVALENCE can be adjusted by clicking and dragging.

Sensitivity, and specificity are determined in a controlled environment ideally with gold standard samples of both disease state and samples similar to the ACTUAL population of interest. Assuming our Sensitivity Specificity and Preevalence numvers are accurate, we want to know how to interpret the test results for a PARTICULAR patient. That is, we want to know the PREDICTIVE VALUE.

When a patient tests positive for disease, they want to know how likely it is that they ACTUALLY have the disease. This is the POSITIVE PREDICTIVE VALUE (PPV).

When a patient tests negative for disease, they want to know how likely it is that they ACTUALLY don't have the illness.. This is the NEGATIVE PREDICTIVE VALUE (NPV).

Things to do and notice:

The graphic below shows possible distributions of 2000 patients into the four possible outcomes of testing based on Sensitivity, Specificity and Prevalence of a test. The box sizes are proportional to the number of patients listed in each quadrant's label. Click-drag any of the colored areas below to change their relative sizes. For example see how changing TP affects FN, or how changing PREVALENCE doesn't affect Sensitivity or Specificity but has enormous effect on PPV and NPV.

Special cases: For a perfect test (Sensitivity=100% and Specificity=100%) , PVs are independent of prevalence.

Here's a relationship that can be very confusing. Even with test that is 99% specific and 99% sensitive (overall accuracy 99%), It the prevalence is very low, (say 1%), then fully HALF of the positive tests will be FALSE POSITIVE. Think about that for a moment. It seems bizarre that a test that is 99% accurate will be wrong 50% of the time in predicting who has the disease. How can this be? The key thing is to realize the importance of PREVALENCE. In this example, the test makes 1% error in both the disease group (that's the sensitivity) and 1% error in the healthy group (that's the specificity). So if the vast majority of the patients are in the healthy group, the vast majority of the errors will also be in that group. That is one of the fundamental problems with doing population screening - that is applying tests with even slightly imperfect SPECIFICITY to a patient population that has low prevalence rates will result in lots of false positive results.

Two-step screening. Screening Tests are challenging because they are applied in low prevalence populations. If a single test that has both High Sensitivity AND High Specificity is not available, then a High Sensitivity Test (low False Negative) is used first to identify MOST of the patients with disease in a population. Although this may produce many false positives, it has the effect of increasing the prevalence of illness in the test-positive cohort. Then a High Specificity "confirmatory" test is applied to that group to reduce the False Positives. The role of a good clinician is to perform the "screening test" judiciously so as to reduce false positive diagnoses while not missing patients with true illness. The first "test" that a clinician applies to any patient is the History and Physical Exam test. By talking with patients and examining them, we are doing testing that either increases or decreases our suspicion of illness. This has the effect of increasing (or decreasing) the probability of that the patient has the illness.

Adjust the chart below to show the following relationships:

When a test with 99% sensitivity and 95% specificity is applied to a population with 5% prevalence of disease, approximately HALF of the patients with positive tests will be FALSE POSITIVEs.

When a test with 99% sensitivity and 99% specificity is applied to a population with 1% prevalence of disease, again, HALF of the patients with positive tests will be FALSE POSITIVEs.

When a test with 99% sensitivity and 90% specificity is applied to a population with 1% prevalence of disease, 95% of the patients with positive tests will be FALSE POSITIVEs.

When a test with 80% sensitivity and 99.5% specificity is applied to a population with 3% prevalence, then less than 20% of the people with positive tests will be FALSE Positive - this is what the Stanford Study of COVID prevalence in Santa Clara County suggested.

TestStat

a graphical tool to explore the relationship between Sensitivity, Specificity, Prevalence and Predictive Values for binary classification.

tgmcnaughton@gmail.com

Mike Byington Ph.D. has created excellent resource for understanding the mathematics behind binary testing.

Sensitive and Specific Detection of Low-Level Antibody Responses in Mild Middle East Respiratory Syndrome Coronavirus Infections. Okba N, Raj V, Widjaja I, et al. Emerging Infectious Diseases. 2019;25(10):1868-1877. doi:10.3201/eid2510.190051. https://wwwnc.cdc.gov/eid/article/25/10/19-0051_article

Serological assays for emerging coronaviruses: challenges and pitfalls. Virus Res. 2014 Dec 19;194:175-83. doi: 10.1016/j.virusres.2014.03.018. Epub 2014 Mar 23. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7114385/

COVID-19 Antibody Seroprevalence in Santa Clara County, California, Eran Bendavid, Bianca Mulaney, Neeraj Sood, Soleil Shah, Emilia Ling, Rebecca Bromley-Dulfano, Cara Lai, Zoe Weissberg, Rodrigo Saavedra, James Tedrow, Dona Tversky, Andrew Bogan, Thomas Kupiec, Daniel Eichner, Ribhav Gupta, John Ioannidis, Jay Bhattacharya medRxiv 2020.04.14.20062463; doi: https://doi.org/10.1101/2020.04.14.20062463 https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

Mike Byington Ph.D. has created excellent resource for understanding the mathematics behind binary testing.

This model was inspired by the teachings of Jerome Hoffman MD and written by Tim McNaughton M.D. Ph.D. tgmcnaughton@gmail.com