Statistics and Risk
This is a reference for all of us who promptly forgot Statistics after taking the test in school. Statistics is the toolbox for scientific research. Skeptical doctors need to have at least a fundamental understanding of the following concepts in order to interpret the medical literature.
Type 1 Error
(notated with the Greek letter alpha) - The error made when one accepts something as true that is actually false. This is the common error made by promoters of pseudoscience by being overly credulous.
Type 2 Error
(notated with the Greek letter beta) - The error made when one rejects something as false that is actually true. This is the common error made by denialists by being overly incredulous.
(denoted "Ho" or -H)) - In clinical treatment testing, Ho is the position that the treatment has no effect over the baseline. Hypothesis testing leads to either accepting or rejecting Ho.
When one wrongly accepts a proposition (and wrongly rejects the Null Hypothesis), he/she commits a Type 1 Error.
When one wrongly rejects a proposition (and wrongly accepts the Null Hypothesis), he/she commits a Type 2 Error.
The average value for a measured entity within a representative sample is often used as the expected value for that entity in other samples or individuals within the population as a whole.
("CI") - When taking a specific measurement within a test sample, the confidence interval is the range of values that is likely (with a specified degree of certainty) to contain the entity being measured. The level of certainty is often set at 95%, such that the measured endpoint can be reliably expected to occur between two limits, 95% of the time. The upper and lower values of the interval must be specified. Optimally, a confidence interval should be as small as possible.
Ultimately, the sample of the population being studied should be large enough to be meaningful and should represent the population as a whole. The entity being studied is measured within the sample and an average is obtained for that sample. The confidence interval lets the reader know within what range one could expect to find this entity among other random samples of the population.
For instance, if we are investigating the effectiveness of drug X to treat disease Y, we can take a sample of the population. We may find that Y improved on average in 40% of the patients that used drug X. The range of patients that improved may actually be between 30 and 50%. This means that we can be 95% certain that 30 to 50% of patients would improve.
Confidence intervals are important when deciding to accept or reject the Null Hypothesis (Ho). In the example above, if the 95% CI of the Null Hypothesis ranged from of 25 to 45% (i.e. 25 to 45% of patients improve with no treatment at all) with a mean of 35%, then the CI of the Null Hypothesis would overlap with the CI of the Claim Hypothesis (H1).
This may lead one to conclude that drug X is not really different than no treatment at all, and therefore accept the Null Hypothesis.
(symbolized as V(X) - The variance is a measure of how scattered around the average value (expected value) one finds the measured values of a variable X. A small variance means that measured values are, on average, closer to the mean of the sample. A large variance means that measured values vary widely from the mean.
In Normal distributions, the expected value and the variance are reported as N(expected, variance). For instance, if the expected value is 10 and the variance is 1, then we write N(10,1).
(Symbolized by the Greek letter sigma) - This is a convenient way to describe the range of variation. It is defined as the square root of the variance. It is reported in the same units as the data. In most normal distribution tables, 68% of the measured values will be +/- 1 standard deviation from the mean. 95% will be within 2 standard deviations above and below the mean.
The upper and lower limits of the 95% Confidence Interval (95% CI) are the values 2 standard deviations above and below the mean.
Standard error is the standard deviation of the values of a given function of the data (parameter), over all possible samples of the same size.
This helps to determine if the mean (average) between two groups is significant.
This is computed in a T- test. It is the difference between the means of the 2 groups (Treatment Group 'T' and Control Group 'C') divided by variability between the 2 groups.
This is the probability of wrongly rejecting the null hypothesis if it is in fact true.
** The p-value is the probability of a Type 1 Error
The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% signficance level, this would be reported as "p < 0.05".
Small p-values suggest that it is unlikely that we have wrongly rejected the null hypothesis. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".
The p-value does not mean the probability of the null hypothesis. It is a statement about the probability of the data in light of the null hypothesis.
In Bayesian terminology (see below), the p value is the probability of the obtaining the data if the null hypothesis were true.
p value = P(D/Ho)
We learned about Cognitive Biases, which influence our psychology. Cognitive biases can lead to biased measurements. Errors due to bias tend to shift the data wrongly in the direction of the bias. Bias errors are not due to randomness. Random errors tend to average out around the expected value and tend to cancel each other out. Bias errors will not and therefore are difficult to see. Care must be taken to eliminate bias with proper blinding and randomization.
The precision measures how closely the measurements are to each other. Bias errors may cause the measurements to be very close to one another, yet far from the actual value. Ideally, if the statistical bias is zero, high precision would mean that the measured values are all close to the actual value.
In hypothesis testing, we can say that a proposition has statistical significance if we can be reasonably sure that we are not wrongly rejecting the null hypothesis.
The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the null hypothesis H0, if it is in fact true.
It is the probability of a type I error and is set by the investigator in relation to the consequences of such an error. That is, we want to make the significance level as small as possible in order to protect the null hypothesis and to prevent, as far as possible, the investigator from inadvertently making false claims.
The significance level is usually denoted by
Significance Level = P(type I error) =
Usually, the significance level is chosen to be 0.05 (or equivalently, 5%). This is designated by the p value.
The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is actually false.
In other words, the power of a hypothesis test is the probability of not committing a type II error. It is calculated by subtracting the probability of a type II error from 1, usually expressed as:
Power = 1 - P(type II error)
The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power, close to 1.
Power is also known as Sensitivity (see below)
(RR) This is the ratio of the risk of an event in an "exposed" group vs. a "non-exposed" (control) group. It may be expressed as a percent over or under the control group. This often causes confusion or can be used deceptively, especially when an event rate is small in both groups.
For instance, if we measure the rate of cancer in the "exposed" group to be 4 in 10,000 over 5 years, and the rate in the "non-exposed group" to be 2 in 10,000 over 5 years, then the relative risk is 4:2. Technically, the risk is 100% higher than in the control group. This sounds very scary, but the actual rate (absolute risk) is small in both groups.
This expresses the risk of an "exposed" group as an absolute number over the risk in a control group. In the above example, one could speak of the absolute risk of cancer being "an extra 2 cases in 10,000 over 5 years" (or, more confusingly, an extra .0002 case per person over 5 years).
Number Needed to Treat
(NNT) This expresses the number of people that would need to receive a treatment in order for one person to benefit. It is the inverse of Absolute Risk. For instance, if we find that a new cholesterol drug will reduce heart attacks by 1 case in 60 (absolute risk = 1/60 or 0.017), then the NNT is 60.
The Absolute Risk and the NNT are easier for most people to understand than the relative risk.
Number Needed to Harm
(NNH) This expresses the number of people that would need to be exposed to a risk in order for one person to be harmed.
This is the chance of an event happening per unit time.
The ratio of the hazard of an event in the treatment arm and the hazard in the control arm. This is sometimes used interchangeably with the Relative Risk, but there are subtle differences.
Law of Large Numbers
This states that the observed rate of an event will be equivocal to the expected rate when the number of trials is relatively large. For instance, if the NNT is 10 for a cancer treatment, then we can expect to observe 1,000 success stories if we treat 10,000 patients. However, if we treat only 10 patients, we may see 0, 1, 2 success stories due to random fluctuations. Remember, randomness tends to be clumpy in the short run, but predictability increases with large numbers.
The Law of Large Numbers is important to keep in mind when discussing rare events. If a public health initiative like a vaccine program has a seemingly small effect (small Absolute Risk Reduction and a large NNT), then vaccine opponents may try to state that the vaccine is not helpful. However, if the vaccine is given to a large number of people, then a respectable (and predictable) portion of the population will survive an otherwise devastating condition.
(TP) - This is the number of positive test results associated with individuals that have the condition in question.
(TN) - This is the number of negative results associated with individuals that do not have the condition in question.
(FP) - This is the number of positive results associated with individuals that do not have the condition.
(FN) - This is the number of negative results associated with individuals that have the condition.
When applying a test to a specified population, this is the number of "true positive" results divided by the total number of positive results.
Sensitivity = TP
TP + FP
** In Bayesian terms, this is the probability of obtaining the data (D) given the hypothesis (H). Sensitivity = P(D/H)
Sensitivity is also expressed as "Power" (see above).
This is the number of True Negatives divided by the total number of negative results.
Specificity = TN
TN + FN
** In Bayesian terms, the p-value, a.k.a. P(D/Ho), is actually equal to (1 - Specificity).
This is the total number of individuals in a specified population with the condition in question divided by the total population at risk.
Prevalence = No. of individuals with condition (A)
No. of individuals with condition (A) + No. of individuals without condition (B)
** In Bayesian terms, this is the prior probability of the hypothesis. Prevalence = P(H)
Positive Predictive Value
(PPV) - This gives one an idea of how sure we can be that a positive test result a true positive.
PPV = TP
TP + FP
** In Bayesian terms, this is the probability of the hypothesis (H) in light of the new data (D). PPV = P(H/D)
Negative Predictive Value
(NPV) - This gives one an idea of how sure we can be that a negative test result is a true negative.
NPV = TN
TN + FN
References and Links
"Guide to Biostatistics - MedPage Today." 2012.
"Clinical Calculator 2 - VassarStats." 2012.
"CEBM > EBM Tools > Critical Appraisal > Explanations and ..." 2007.
"Some Useful Statistics Definitions." 2005. 13 Oct. 2012
"The T-Test - Web Center for Social Research Methods." 2006.
Sterne, Jonathan AC, George Davey Smith, and DR Cox. "Sifting the evidence—what's wrong with significance tests? Another comment on the role of statistical methods." Bmj 322.7280 (2001): 226-231.
"Statistics Glossary - alphabetical index."
"Glossary - Clinical Trials Terminology."
"Publication bias - Wikipedia, the free encyclopedia." 2004.
"Number needed to treat - Wikipedia, the free encyclopedia." 2005.
"Law of large numbers - Wikipedia, the free encyclopedia." 2004.
"Regression toward the mean - Wikipedia, the free encyclopedia." 2003.
"Variance - Wikipedia, the free encyclopedia." 2003.
"www.childrens-mercy.org/stats/journal/oddsratio.asp - Similar Sites ..." 2010.
"Relative risk - Number Needed to Treat - MedCalc." 2010.
"Relative risk - Wikipedia, the free encyclopedia." 2005.
"Absolute risk reduction - Wikipedia, the free encyclopedia." 2006.
"Bayes factor - Wikipedia, the free encyclopedia." 2004.
"Glossary of EBM Terms - KT Clearinghouse." 2010.
"The Low-Carb Lecture on Clinical Trials - Comcast.net." 2007.
Goodman, Steven N. "Toward evidence-based medical statistics. 2: The Bayes factor." Annals of internal medicine 130 (1999): 1005-1013.