Medical Epidemiology
Epidemiology
Defined as the branch of medicine that studies the risks, causes, frequency, distribution, risks, and control of disease in populations. I thas its own set of biostatistical terminology.
Incidence
How many people will have a newly acquired condition in a given period, such as 1 in 5 per year.
Number of new cases of a condition occurring in a population over a specified period of time divided by the number of people in the population at risk.
E.g.: Town of 10,000 previously healthy people, where, in 2009, 50 people were affected by a new strain of swine flu, and the flu lasted only about a week:
Incidence of swine flu in 2009 = 50/10,000 = 0.005 or 0.5%.
Incidence can be reported as a percentage of a population, or as a ration, commonly expressed as "per 100,000."
Good to measure the presence of short-term disease.
If an acute, fleeting 24-hour illness spreads widely through the population in both 2009 and 2010, the prevalence of the disease at a particular moment in either 2009 or 2010 may be very low at the moment the survey is taken, because many people have already recovered from the disease, or have yet to contract the disease. The incidence of the disease in 2009 or in 2010 would be high, though, since there are many new cases the disease in those years.
Tells you the risk of contracting the disease.
Reported as "per year"
Prevalence
It is the number of cases, new or old, found at a particular moment divided by the number of people in the population at risk.
Written often as a ratio, such as 1:4 persons over the age of 40 years have high cholesterol.
E.g.: Town of 10,000 previously healthy people, where, in 2009, 50 people were affected by a new strain of flu, and which lasted only a week:
Number of people who had flu in the town on June 1 of 2009 might be 4 people.
Prevalence rate of flu on June 1 would be = 4/10,000 = 0.0004 or 0.04%, because many people who contracted flu prior to June 1 have already got better.
Good to measure the presence of chronic disease. An increasing prevalence rate may indicate that the disease is lasting longer and not actually spreading.
For example: If a chronic illness widely strikes a population in 2009, but there are no new cases in 2010, the prevalence of the disease for 2010 may be high, even though no one contracted it in 2010, because the disease is chronic, and people who contracted in 2009 still have it in 2010. The incidence of the disease in 2010 would be 0 because they were not no new cases in 2010.
Tells you how widespread the disease is.
Reported as a percentage of a population, or as a ration, commonly expressed as "per 100,000."
Reported over a short time interval (month, one day).
Bias in incidence and prevalence
It occurs when a disease is underreported (due to social stigma of the disease), or lack of diligence in record keeping.
Duration
How long a given condition lasts, on average.
Forumula Prevalence and incidence.
The three terms are related by the formula:
Prevalence = incidence x duration
Example: In a given population, 1 in 100 persons acquires a new plantar wart each year (incidence). On average, the wart will last two years (duration). Survey of this population in any given year, and roughly 2 in 100 person will have plantar wart (Incidence x duration = prevalence) 1 x 2 = 2.
Diseases with very short durations, will have incidence = prevalence
Mortality
It represents incidence of death in the population at risk
May be entire population or just a subset of the population
"Mortality in patients who had MI in 2009 was 15%."
Case fatality rate is used when a preceding illness is responsible for the mortality
Attack rate is used when the population is exposed to a toxin (food poisoning).
Attack rate of food poisoning in people who ate at that restaurant last week was 40%.
Morbidity
It is the same as mortality except that it is the incidence of a particular disease, rather than death. It is expressed as percentage.
Absolute Risk
It is similar to incidence and is given as percentage.
For instance if 1 out of 100 smokers will get lung cancer in a lifetime, the AR (incidence rate) of smokers getting lung cancer in a lifetime is 1%.
For instance, if incidence of COPD amongst smokers is 5%, the absolute risk is 5%.
Relative Risk (risk ratio)
RR = incidence of people exposed to the risk factor divided by incidence in people not exposed to the risk factor.
For instance, if incidence of COPD amongst smokers is 5% and in non-smokes is 0.05%, then the relative risk of smokers getting COPD is 0.5/0.005 = 10, indicating that smoker are 10 times as likely as non-smokers to get COPD. The absolute risk in this case would be 5%. Relative risk is calculated as ration.
Odds and Odds Ratio (Relative Odds)
Orders differs from probability. The probability that the horse can win a race is the fraction of times that you would expect a horse to win. If you expect a horse to win 3 out of 4 times, the probability winning is 3/4 = 0.75 (75%) of the time. Probability is a percentage.
The odds of that horse winning is the number of times you would expect the horse to win divided by the number of times it is expected to lose. In this case the odds is 3/4 = 3, a number rather than a percentage.
Whether computing the probability of the odds, the horse is 3 times more likely to win than loose. A probability has to lie between 0 and 1 (when expressed as decimels). Odds can be anywhere from 0 to infinity.
Clinical odds ratio generally refers to the odds that a person with the disease was exposed in the past to the risk factor for that disease divided by the odds that the control group had exposure to similar risk factors.
Even though relative risk and odds ratio are both ratios, there is a difference between relative risk and odds ratio. Say that 4 out of 5 smokers will get a heart attack within a certain period of time., while 1 out of 5 nonsmokers will get a heart attack in that time period. The probability of a smoker getting a heart attack is then 4 out of 5, or 0.80 (80%). The probability of a non-smoker getting a heart attack as 1 out of 5, or 0.20 (20%). The odds of a smoker getting a heart attack is 4:1, or 4 (a number, not a percentage). The odds of a non-smoker getting a heart attack is 1:4, or 0.25. The relative risk of a smoker getting a heart attack is then 0.80/0.20 = 4. The odds ratio for smoker getting a heart attack is 4/0.25 equals 16 to 1.
Relative risk ratios are more intuitive than ordered Serratia. In the above example, the relative risk states that a smoker 4 times more likely to get a heart attack than a non-smoker, something that is relatively intuitive to grasp. Mother reports of a smoker getting a heart attack is also 4 times that of a non-smoker, the odds ratio is 16, telling you that the orders of a smoker getting a heart attack is 16 times the orbits of a non-smoker getting a heart attack; the #16 seems too high, and can be confusing.
So why even use odds ratios? Just use relative risk. The problem is that while relative risk assessment can be used in prospective studies, it cannot be used in the retrospective studies, because, as mentioned: Relative risk = incidence of the illness in people exposed to the risk divided by incidence of illness in people not exposed to the risk.
In order to do a relative risk assessment, you need to know the incidence of the illness and the people exposed to the risk. You can know this in a prospective study but not in a retrospective study. Therefore, odds ratio are used in retrospective studies. You can describe the odds of someone with illness having had past exposure to the risk factors and the odds of someone without the illness having had past exposure to the risk factors, which is what you need for an odds ratio. In practice, the disease is relatively rare, which is commonly the case, the odds ratio is close to the relative risk, so the terms can be interchangeable in those cases.
Absolute Risk Reduction (Attributable Risk) vs Relative Risk Reduction
Absolute risk reduction and attributable risk are the same in that both are the difference between two incidences. They differ in that absolute risk reduction refers to getting better, while at attributable risk refers to getting sicker.
Absolute risk reduction is the incidence of disease progression in people taking the placebo minus the incidence of disease progression in people taking the treatment. For instance, if the incidence of a smoker who takes the placebo getting lung cancer is 3% and the incidence in a smoker taking a new treatment is 1%, then the absolute risk reduction of the treatment is 3% -1% = 2% (0.02).
At attributable risk is the incidence of disease attributed to the risk factor minus the incidence of the disease in persons not exposed to the risk factor. For instance, if the incidence of a non-smoker getting lung cancer is 0.5% and the incidence of a smoker getting lung cancer is 3%, the attributable risk of smoking in people with lung cancer is 3% - 0.5% = 2.5% (0.25).
Some of the attributable risk can be attributed to random factors in the sampling. How much? Computer program can calculate a 95% confidence interval, which indicates what the range of difference would be in 95% of the repeated trials. If the 95% range does not include 0 difference, the difference is statistically significant.
Recall that the relative risk is the incidence of illness in people exposed to the risk divided by the incidence of the illness and people not exposed to the risk. Relative risk is a percentage.
Relative risk reduction = (1 - relative risk)
For instance: If the absolute risk (incidence) of death in a person left untreated for rare pulmonary disease is 0.002 (0.2%) and the absolute risk (incidence) of death if the person is treated for the disease is 0.02 (2%), then the relative risk of death in treated individual is 0.002/0.02 = 0.1 (10%). The relative risk reduction is 1 - 0.10 = 0.90 (90%). Sounds like a great treatment! Compare this with the absolute risk reduction. As mentioned, the absolute risk (incidence) of death when untreated is 0.002 (0.2%), and the absolute risk (incidence) of the disease when treated is 0.02 (2%). The absolute risk reduction of the treatment of the rare disease is only 0.02-0.002 = 0.018 (about 1.8%). Sounds like the treatment is hardly effective! So the relative risk reduction is 90%, while the absolute risk reduction is only 1.8%! The reason for the difference is that the relative risk reduction is based on proportions, which do not take into account the rarity of a disease, while absolute risk reduction is based on subtraction of percentages,and can be a very small number with a rare disease. It can be confusing, when it is unclear whether the relative risk reduction or absolute risk reduction is presented as a research result.
A drug company that wanted to exaggerate the effects of its treatment of a rare disease might list the relative risk reduction (90% reduction) rather than absolute risk reduction (a measly 1.8%). It is important to state not only the relative risk (and relative risk reduction) but the absolute risk (and absolute risk reduction). The more uncommon the disease, the greater the discrepancy between relative risk reduction in absolute risk reduction.
Numbers needed to treat (NNT)
The number of patients who need to be treated to achieve one additional favorable outcome.
Calculated as 1/absolute risk reduction (ARR), rounded up to the nearest whole number.
How many patients would have to be treated for the disease to be prevented in 1 person. For instance, if the absolute risk reduction is 1.8% (0.018), the number of persons needed to treat to prevent one person from getting the disease is 1/0.018 = 56 persons. If the absolute risk reduction were small, say 0.2% (0.002), the NNT would be 1/0.002 = 500. You would then have to treat 500 people to prevent the disease in 1 person. If the cost of treating 1 person is $3000 a year, then you would spend $3000 x 500 people equal to $1,500,000 to prevent the disease in 1 person. You have to decide whether the price of treatment is cost effective. Hence, NNT is valuable information for cost effectiveness evaluation. When NNT is a large, it implies that the therapy is relatively ineffective or the condition is relatively rare.
Numbers needed to harm (NNH)
The number of patients who, if they received the experimental treatment, would lead to one additional person being harmed compared with patients who receive the control treatment.
Calculated as 1/attributable risk.
For instance, in prescribing antibiotics for strep throat, 1 study (Newman, '08) found that the NNT was 40,000; namely 40,000 patients would have to be treated to prevent a single case of rheumatic fever. The NNH was 5000; namely 1 of 5000 treated patients would experience a severe allergy reaction to the treatment. Should antibiotics be prescribed so liberally for sore throats? NNT and NNH are important considerations in therapeutic decisions.
In contrast with the high NNT for the use of antibiotics for strep throat, NNT for antibiotic therapy to eradicate H. pylori gastric ulcers is about 1.1; 10 of 11 people with H. pylori will be cured.
Advertisements for the cholesterol-lowering drug Lipitor indicate that it results in about one third reduction in heart attacks (Carey, '08; Lenzer et al., '10). Sounds good. However, the NNT shows that 100 men would have to be treated for 5 years to prevent a heart attack or stroke in 2 of them (98 of 100 men would receive no benefit). The NNH is important, too, since all the people being treated would be subject to the potential side effects of the drug including serious muscle and liver problems. It is unethical for a drug company to mention the percentage improvement but avoid mentioning the NNT. This policy is particularly devious when the company conversely mentions the NNH without indicating the percentage or harm. For instance the company may point to an NNH of 100 (only 1 out of 100 people taking the drug is harmed) but neglected to mention that the same data means a 50% increase in harmful effect when compared with controls.
While these considerations at first glance may raise the question as to whether or not the cholesterol-lowering drug should be used, there are other considerations:
The greater the number of risk factors that the patient has, the greater the potential usefulness of the drug and the lower may be the NNT for people in that risk group.
Since the NNT figure applies to only 5 years on the drug, the NNT may be much lower if the patient is on the drug for a longer time, say 30 years. But so may be the NNH. You would like to have a low NNT and a high NNH.
If only 2 of 100 people would benefit from the drug, it means that 20,000 out of 1 million people would benefit. Is it worthwhile to give a drug to 1 million people to benefit 20,000 people? This is in a sense akin to purchasing a raffle ticket for a charity. It benefits the charity, even though you have only a small chance of winning.
NNT, NNH, and clinical judgment are together important to properly assess the need for given drug, particularly a lifelong drug, especially in this day of increasing medical expenses.
Sensitivity
The percentage of patients with the disorder who have a positive result (TP). Chance of having a positive finding, given that a disease is present.
Number of people who have the disease and test positive/the number of people who have the disease
TP/TP + FN or A/A+C
The greater the sensitivity, the more likely the test will detect patients with the disease.
High sensitivity tests are useful clinically to rule out a disease (SnOUT). In other words a negative test that is highly sensitive would virtually exclude the possibility of the disease, because the FN (false negative ) result for the test is very low.
e.g., BNP, NM - bone scan, D-dimer, SFEMG for MG.
Specificity
The percentage of patients without the disorder who have a negative test result (TN). Chance of having a negative finding, given that a disease is absent.
Number of people who do not have the the disease and test negative/the number of people who do not have the disease.
TN/TN + FP
Specific test are used to rule in conditions (SpIN). In other words a positive test which is higly specific would virtually include the possibility of the disease, because the FP (false positive) result for the test is very low.
Ideally, a test should be very sensitive and very specific so that there are no false positives or negatives. Sometimes, to deal with false positives and negatives, 2 tests are used, 1 very sensitive and the other very specific.
If the very sensitive test is negative, the patient does not have the disease, so go no further.
But if the sensitive test is positive, you might be dealing with a false positive. If it is a false positive, this is followed by (possibly more expensive) test with high specificity, which, if positive, will confirm that the positive is real.
So why not just do the specific test? The specific test may not be very sensitive, and you might miss the diagnosis (get a false negative) with just a specific test. As an example, the ELISA and Western blot tests are both use for the detection of HIV. ELISA is used initially, since it is relatively sensitive, but there could be a false positive (e.g., in patients with allergies and recent acute illnesses). If positive, ELISA is followed by a more specific confirmatory Western blot test.
If the test is designed to detect an illness, example diabetes, based on the blood level of glucose, the wider the blood level range that is considered diabetic the greater the chance of false positive, i.e., some people without diabetes will fall within the range and the test will be less specific for diabetes. If the range that is considered diabetic is too narrow, the greater will be the chance of a false negative, i.e., some people with diabetes will be missed; the test will be less sensitive for diabetes. Thus, there is a trade off between sensitivity and specificity in the way test value ranges set.
Commonly, laboratory diagnostic testing list a range of normal values, those found in clinically normal people; values outside the range should raise the red flags as to a possible illness. The question is where to draw the line between what is clinically normal and what is not. If the listed normal range is too narrow, some people outside the range will be declared clinically abnormal even if they are not; there will be false positives. If the range of the test is too wide, some people inside the range will be declared clinically normal even if they are not; they will be false negative.
It is important to remember that someone who falls outside the statistically normal range of a particular lab test is not necessarily clinically abnormal. The person may just be unusual, but nonetheless clinically normal. It is important for the clinician to not just look at one lab test result, but the constellation of lab tests, physical exam, and history to determine whether or not the patient is clinically abnormal.
Positive predictive value:
It is the likelihood that a positive test result truly means that the patient has the disease being tested.
It is the number of True Positive (TP) results out of a total number of positive test results.
TP/TP + FP
If the disease has high prevalence the PPV of the test will be high. Thus, the PPV is dependent on the prevalence of the disease in the population.
Question 1: "Doc, I have tested positive for HIV. How likely is that that I have it?" This crucial question is not answered by the specificity or sensitivity of the tests. Unless the test is 100% specific, the specificity of the test does not tell you the chance that the patient has HIV. What we want to know is the positive predictive value of the test, namely:
PPV = number who test positive and who have the disease/number who test positive who may or may not have the disease OR
PPV = those who tested positive and turned out to have the disease/total who tested positive.
Negative predictive value
It is the likelihood that a negative test result truly means that the patient does not have the disease being tested.
It is the number of True Negative (TN) test results out of a total number of negative results.
TN/TN + FN
Question 2: "Doc, I have tested negative for HIV. Does this mean I do not have it?" This also is not answered by the specificity or sensitivity of the test. We want to know the negative predictive value of the test namely:
NPV = number of people who test negative and have the disease/the number of people who test negative.
The more prevalent the disease is in a population, the higher the positive predictive value in the lower with the negative predictive value.
To solve sensitivity, specificity, PPV, NPV problems:
Pick an easy number. E.g. Sample size: 1000
Use prevalence to calculate how many in the sample will have and how many will not have the disease. Prevalence is given. E.g. 1%
1% = 1: 100.
1% of 1000 is 1000 x 0.01 = 10. So 10 will likely to have the disease and 990 are unlikely to have the disease.
For those who are likely to have the disease use the test's sensitivity to determine how many will test positive (TP) and how many will test negative (FN).
E.g. Use a test with sensitivity of 90%.
90% of 10 is 10 x 0.9 = 9. So 9 people who are likely to have the disease will test positive. So 9 people actually have the disease, TP; 1 person who is likely to have the disease will test negative, FN.
For those who are unlikely to have the disease, use specificity to calculate how many will test negative (TN) and how many will test positive (FP).
E.g. Use a test with specificity of 80%.
80% of 990 is 990 x 0.8 = 792. So 792 people who are unlikely to have the disease will test negative. So they actually don't have the disease, TN; the remaining 198 who are unlikely to have the condition will test positive (FP).
Calculate the positive predictive value by dividing the TP by all positive results. Get a percentage
TP/TP + FP = ? x 100
9/9 + 198 = 0.043 x 100 = 4.3%
Calculate the negative predictive value by dividing the TN by all negative results. Get a percentage.
TN/TN + FN = ? x 100
792/792 + 1 = 0.99 x 100 = 99%
True positive
It is a positive test result in a person who has the condition.
False positive
It is a positive test result in a person who does not have the condition.
True negative
It is a negative test result in a person who does not have the condition
False negative
It is a negative test result in a person who has the condition.
True positive and false negatives actually have the condition being tested for.
True negatives and false-positive do not have the condition.
Sensitivity, specificity, positive predictive value and negative predictive value can be determined by TP, FP, TN, FN.
P-Value
Probability that any particular outcome occurred by chance.
Aribtrary
p value <0.05 means 1 in 20 by chance. Statistically significant
p value <0.01 means 1 in 100 by chance. Highly significant.
p <0.05 or P < 0.01 = null hypothesis must be rejected (there is no real difference between the two groups)
Confirmation bias
It is when a person selectively seeks out information that supports a belief or idea that they already have, thus "confirming" their existing beliefs. However, information that supports the contrary is not taken into consideration, dismissed, or selectively ignored. These beliefs are largely derived from stereotypes and overgeneralizations that are combined with faulty deductive logic, most commonly about particular demographic groups.
Anchoring bias
When people are trying to make a decision, they often use an anchor or focal point as a reference or starting point. Psychologists have found that people have a tendency to rely too heavily on the very first piece of information they learn, which can have a serious impact on the decision they end up making. In psychology, this type of cognitive bias is known as the anchoring bias or anchoring effect.
Randomised controlled trials (RCT)
In an ideal setting in a RCT, participants are randomly allocated by a process equivalent to the flip of a coin to either one intervention (such as drug treatment) or another (such as placebo treatment). Then both groups are followed up for a specified period of time and analyzed in terms of specific outcomes defined at the outset of the study (e.g. death, MI, stroke, serum cholesterol levels, etc.) The premise here is that since on average, the groups are identical apart from the intervention, any differences in outcome are, in theory, likely to be from the intervention.
Advantages of the RCT design:
Allows rigorous evaluation of a single variable (e.g. effect of drug treatment vs placebo) in a precisely defined patient group (e.g. post-menopausal women aged 50 - 60 years).
Prospective design (i.e. data are collected on events which happen after you decide to do the study)
Uses hypotheticodeductive reasoning (i.e. seeks to falsify, rather than confirm, its own hypothesis)
Potentially eradicates bias by comparing two otherwise identical groups.
Allows for meta-analysis (combining the numerical results of several similar trials).
Disadvantages of the RCT design:
Expensive and time consuming
Many RCTs are not completed or are performed on too few patients or are done for a too short period of time.
Most RCTs are funded by large research bodies (university, government sponsored) or drug companies, who ultimately dictate the research agenda.
Surrogate endpoints may not reflect outcomes that are important to patients.
May introduce "hidden bias", especially through:
Imperfect randomization
Failure to randomize all eligible patients ("cherry pick patients")
Failure to blind assesors to randomization status of patients.
RCTs are unnecessary:
When a clearly successful intervention for an otherwise fatal condition is discovered
When a previous RCT or meta-analysis has given a definitive result (either positive or negative and no in-betweens).
It may be that a thorough search of the literature needs to be performed before deciding whether an RCT needs to be done.
RCTs are impractical:
When it would be unethical to seek consent to randomize.
Where the number of participants needed to demonstrate a significant difference between the groups is prohibitively high.