How Many Santa Clara COVID tests were False Positives?

The results of the Santa Clara COVID Prevalence Study estimated the prevalence of COVID19 infections between 1.5 and 3%. When real-world tests are applied to low prevalence populations, there are always errors - false positives and false negatives. This is a graphical tool to explore the sometimes confusing relationships between Sensitivity, Specificity, and Prevalence and the way they interact to influence the Predictive Values of a binary classification test.

tgmcnaughton@gmail.com

First - a brief primer on Binary Classification.

skip to "Things To Do and Notice" below if you are already familiar with things like sensitivity and specificity

Binary Diagnostic Tests are tests that return one of two qualitative results:

Yes/No. Positive/Negative. Present/Absent

With a perfectly accurate diagnostic test, ALL people with the disease would have positive test results (SENSITIVITY, or TRUE POSITIVE rate = 100% ) and ALL people without disease would have negative tests (SPECIFICITY, TRUE NEGATIVE = 100%). This perfect accuracy is independent of PREVALENCE (the proportion of people with disease).

But real-world binary tests are NOT perfect. They make errors of two types: Missed Alarms, and False Alarms.

MISSED alarms (called FALSE NEGATIVEs) are related to imperfect SENSITIVITY.

FALSE alarms (called FALSE POSITIVEs) are related to imperfect SPECIFICITY.

A contingency table of 2 rows and 2 columns can be made to show the results when a test is applied to a particular population.


A Contingency Table for a binary test is presented in graphic form below. The 4 quadrants show the 4 possible outcomes of testing. The Left side columns show people who actually have the DISEASE. The Right side columns show HEALTHY people. The top row shows people who have POSITIVE tests and the bottom row shows those with NEGATIVE tests. The size of the colored boxes representing the 4 quadrants is proportional to the number of patients in each group: TP: True Positive, FP: False Positive, FN: False Negative, TN: True Negative.

When physicians use diagnostic tests, we want to know how often a patient with a positive test actually has the disease. This is called the POSITIVE PREDICTIVE VALUE (PPV).

Similarly, when a patient tests negative for disease, we want to know how likely it is that they are actually healthy. This is the NEGATIVE PREDICTIVE VALUE (NPV).

Things to do and notice:

The starting graphic below shows the distribution of 3300 patients into the four possible outcomes of testing based on the Sensitivity (80.3%), Specificity (99.5%) and Adjusted Prevalence (3%) found in the Santa Clara test population. The box sizes are proportional to the number of patients listed in each quadrant's label. The graph is an interactive learning tool, and the Sensitivity, Specificity, and Prevalence numbers can be varied by clicking and dragging with your mouse. When you change any of the three basic parameters you can see how they interact. Click-drag any of the colored areas below to change their relative sizes. For example see how changing TP affects FN, or how changing PREVALENCE doesn't affect Sensitivity or Specificity but has enormous effect on PPV and NPV.

For tests with LOW SENSITIVITY and LOW PREVALENCE there is a HIGH False POSITIVE Rate and so POSITIVE PREDICTIVE VALUE is low.

For tests with LOW SPECIFICITY and LOW PREVALENCE there is a HIGH False NEGATIVE Rate and so NEGATIVE PREDICTIVE VALUE is low.

Click on the colored boxes to vary the values for PREVALENCE, SENSITIVITY and SPECIFICITY to vary the parameters around the actual numbers reported in the study. Here's a relationship that can be very confusing: Notice that the PPV is VERY dependent upon SPECIFICITY when the PREVALENCE is low but nearly independent of SENSITIVITY.

How did the authors get their numbers: to measure specificity of their test the authors tested 401 sample of blood from several years ago (before COVID-19 existed) and found only 2 false positives. This could be from cross reactivity to one of the other known coronaviruses that cause mild illness. This is where they get the estimate of 99.5% for specificity. Sensitivity was a more complicated process - the table at the bottom of this page shows where the estimates of both SENSITIVITY and SPECIFICITY came from in the study.

For Prevalence, the authors found 50 positive tests out of 3300 for an unadjusted estimate of 1.5%.

Two-step screening. A single screening test is challenging because screening is nearly always applied in low prevalence populations. And it is rare to have a single test that has both High Sensitivity AND High Specificity. An alternative approach is to use a High Sensitivity Test (low False Negative) to identify ALL patients in a population. Although this may produce many false positives, it has the effect of increasing the prevalence of illness in the test-positive cohort and not missing anyone with disease. Then a High Specificity "confirmatory" test is applied to eliminate the False Positives. This is the best way to do population screening for rare illnesses.

Ascertainment Bias: Another source of error that is especially important when trying to decide how much of our population has been exposed to COVID has to do with WHO we test. For example, if we only test people who have recently been ill with fever and cough or who think they might have had COVID illness, then the prevalence in that group would be higher than in the whole county population and so we would overestimate our true prevalence if we just scaled up that number to the whole population. The only way to accurately measure population prevalence is to test a TRULY RANDOM sample of the population. In the Santa Clara study, more people from wealthy areas and more women than men responded to the Facebook recruitment method. However, the authors did anticipate this and so made adjustments, by limiting enrollment in the study by household, sex, and by ZIP code. That's why they have "Adjusted" and "Unadjusted" prevalence numbers.

Some criticism has been raised regarding how the results of the Santa Clara Study might overestimate prevalence by not sufficiently accounting for false positives. The concern was that if the TRUE prevalence was actually lower than the researchers found (due to ascertainment bias) then false positives would be higher and so would over-estimate the county-wide COVID infection rate. Ascertainment bias causes error in two ways. The first is easy to understand. If people who came to be tested were twice as likely to have COVID then scaling to the whole county would overestimate cases by a factor of 2. But correcting for ascertainment bias also results in lowered estimates of prevalence which increases False Positives since the Positive Predictive Value gets worse at lower prevalence. This is the tricky thing to understand that the model can help with:

So, How Many Santa Clara COVID tests were False Positives? If the SPECIFICITY of the test is really 99.5% AND:

If there was no ascertainment bias - then the TRUE prevalence rate is 3% and 20% of positives are False.

If respondents were twice as likely to have COVID than general county residents, then there was 50% ascertainment bias, the TRUE prevalence rate is 1.5% and 30% of positives are False.

If respondents were three times as likely to have COVID than general county residents, then TRUE prevalence rate is 1% and 40% of positives are False.

This shows how much additional error is introduced for ascertainment bias.

These results are especially dependent upon the true SPECIFICITY of the test. Experiment for yourself - see how varying the SPECIFICITY from 98.3 to 99.9 ( the 95% Confidence Interval limits calculated by the authors) changes the Positive Predictive Value and the number of false positives. It is interesting to see that SENSITIVITY also affects the PPV but to a MUCH lesser extent.

** The Mathews Correlation Coefficient is a way to score how well a classifier performed. it varies from 0 (Useless) to 1 (perfect). The Santa Clara Study achieved a Mathews Score of 0.811 which is excellent. (However, this measure does not account for ascertainment bias).



This model was inspired by Jerome Hoffman MD and written by Timothy G. McNaughton M.D. Ph.D.




This table shows source of sensitivity/specificity numbers from Santa Clara Study