Schmidt & Brown (2021). Evidence-Based Practice for Nurses:
Chapter 9: Epidemiologic Designs: Using Data to Understand Populations
Define epidemiology
Describe the use of epidemiology in nursing practice
Describe count data, ratios, proportions, and rates
Define and compute prevalence
Define and compute incidence
Explain descriptive characteristics of person, place, and time when examining the distribution of disease in a population
Describe descriptive study designs, including case reports and series, ecologic studies, and cross-sectional studies
Construct a 2 Ă— 2 data table
Calculate a prevalence ratio for a cross-sectional study design
Describe analytic study designs, including case-control studies, cohort studies, and intervention studies
Calculate an odds ratio for a case-control study design
Calculate relative risk for a cohort study design
= The study of distribution and determinants of disease in human populations
The Latin prefix & suffix:
epi = "upon"
the root demos = "the people"
logos = "the study of"
Considered as the modern father of epidemiology as a result of his investigations into the 1854 cholera epidemic in London
5 Examples of Data Visualizations:
London Cholera Map by John Snow
Gapminder by Hans Rosling
March on Moscow by Charles Minard
War Mortality by Florence Nightingale
Chart of Biography by Joseph Priestley
Nurses use CSI-like skills to investigate patterns of disease to best determine contributing factors with the goal of improving health outcomes:
Example 1: A nurse in the pediatric intensive care unit might ask, "What factors are contributing to the increase in gunshot wounds in June as compared to February?
Example 2: A family nurse practitioner might ask, "Why are more children affected by asthma in one county as compared with children in a neighboring county?
Example 3: A nurse scientist might ask, "Why are women who cook over open flames more at risk for upper respiration infection as compared to women who cook with gas or electricity?
Epidemiologic investigations can be descriptive or analytic:
Descriptive epidemiology
= Examination of the distribution of disease in a population in terms of person, place, and time
distribution = the pattern of disease occurrence in and among populations or subgroups
Analytic epidemiology
= Investigation of the determinants of disease
determinants = factors that are capable of bringing a change in health
Count Data
= the raw number of health phenomena under investigation in epidemiology
Examples of health events:
births
cases of a disease
deaths
Not particularly useful when comparing populations of different sizes
Example: A country with a large population will have more births than will a country in a small population regardless of other factors associated with birth rate
When population differs in size, epidemiologists use ratios to compare and contract health outcomes across populations
Ratios
= The highest level of measurement, which involves numeric values that begin with an absolute zero and have equal intervals; in epidemiology, a mathematical relationship between two numbers
Formula = a/b
Example: of 1,000 patients with acute myocardial infarction (AMI), 600 were male and 400 were female. The sex ratio for AMIs is:
Number of male cases / Number of female cases = 600 / 400 = 1.5 : 1 male to female
Proportions
= A type of ratio where the numerators is included in the dominator
Formula = a/(a + b) multiplied by 1,000
The numerator and denominator have an association
Numerator = the number of cases, deaths, or events
Denominator = the population being studies
Used in epidemiology to describe measures, such as:
prevalence
cumulative incidence
case fatality rates
attack rates
Example: of 300,000 children, 3,800 were diagnosed with an autism spectrum disorder and 296,200 were not. The proportion of children diagnosed with autism spectrum disorder is:
a/(a + b) * 1,000 = 3,800 / (3,800 + 296,200) * 1,000
= 3,800 / 300,000 * 1,000
= 0.01266 * 1,000
= 12.66 per 1,000 children
= meaning the proportion of autism spectrum disorder in this population of children is 12.66 per 1,000 children
Rates
A measure of disease frequency in a defined population over a specified period of time
When calculating a rate:
Numerator = the number of people affected
Denominator = the entire population
Used in epidemiology to describe measures, such as:
density
crude mortality rates
fertility rates
Examples: of 5,610 babies born in Lake County in 2017, 59 babies died before reaching their fist birthday. The infant mortality rate in 2017 was:
Number of infant deaths during time period / Number of live births during time period * 1,000 = 59 / 5,610 * 1,000
= 10.5 per 1,000 live births
Prevalence
= the number of existing cases of disease present in the population
An indicator of the extent of a health problem in a population and is used for planning related to healthcare needs for communities
May be expressed as:
a number
percentage
proportion
rate
Formula for prevalence rate per 100,000 at a specified time:
Prevalence = Number of existing cases of a disease / total population * 100,000
Example: In 2018, of the 75,877,700 people younger than age 19 years, 210,000 people were estimated to have been diagnosed with diabetes. The prevalence of diabetes in the United States in 2018 in this age group was:
Number of existing cases / Total population * 100,000 = 210,000 / 75,877,700 * 100,000
= 0.0027 * 100,000
= 270 per 100,000
Two different terms used to describe prevalence:
Point Prevalence
= The number of existing cases of disease in a population at a particular point in time
Example: Calculate the point prevalence of flu on a college campus today and then compare that to the point prevalence of flu on the same day in the previous year
Period Prevalence
= The number of existing cases of disease in a population during a specified period of time
The time period might be a week, a month, a year, or a number of years
Example: Calculate the period prevalence of flu from the previous year's flu season to make decisions about how many flu shots to purchase for the current year
Incidence
= The number of new cases of a disease in a population during a specified period of time
The incidence rate is a measure of disease occurrence and is used for investigating the causes of disease
Use incidence rates to investigate risk of disease in populations or subgroups related to many factors, such as age, gender, occupation, and exposures
Formula: Divide he number of new cases that occur during a time period by the number of individuals in the population
Incidence = Number of NEW cases of a disease / total population * multiplier
Example: In 2017, of the 6,666,818 people living in Indiana, there were 143 confirmed new cases of Lyme disease. The incidence of Lyme disease in Indiana in 2017 was:
Number of new cases / total population * 100,000 = 143 / 6,666,818 * 100,000
= 0.000021 * 100,000
= 2.14 per 100,000
The example that the 4th edition textbook used was: Between 2000 and 2016, the prevalence of persons living with Lyme disease was 970. In 2016, of the 6,483,802 people living in Indiana, there were 81 confirmed new cases of Lyme disease. The incidence of Lyme disease was:
Number of new cases / Total population at risk * 100,000 = 81 / (6,483,802 – 907) * 100,000
= 0.000012 * 100,000
= 1.25 per 100,000
You may notice that the formula is different. I asked Dr. Schmidt, the author of the textbook, about this, and here is her response (N. Schmidt, personal communication, October 22, 2021):
Hello, Dr. Niitsu: Thanks so much for using our textbook. You have asked an interesting question about something that the chapter author and I have discussed over the years.
In the literature, there seems to be two formulas for calculating incidence. In the past edition, the denominator reflected the "individuals at risk." This is calculated by subtracting the current prevalence from the population in the denominator. The rationale for using this formula was that it provided a more accurate number for incidence because the number of people who already have a disease would not be at risk because they already have the disease. This is also the formula that is more commonly used in epidemiology practice and at the graduate level.
In the current edition, we decided to go with a formula that uses the "total population" (rather than the population at risk). We based this decision on a few reasons. First, it is an easier formula for students to use. It is more like the formula for calculating prevalence. Second, it is the formula that is usually found in more general books about epidemiology, such as global health books. For example, the introductory global health textbook by Richard Skolnik, a respected author, uses the simplified formula. But a third reason we considered is the fact that, regarding COVID-19, it appears that those who have already had the infection can become reinfected. Thus, they are still at risk and it probably is a bit misleading to remove them from the denominator.
From what I see in the literature, either formula can be used. For some diseases, like heart disease, once you have it you are no longer at risk because you now have a chronic disease, thus using the 4th edition formula may be more appropriate. But for other diseases, such as COVID-19, people can become reinfected. Having the disease doesn't necessarily mean that one is not at risk for getting it again at a later time. In light of what we are learning about COVID-19, perhaps that makes the 5th edition formula more appropriate.
Dr. Buckenmeyer and I have had interesting conversations about which formula to use. And if you go back to other editions, you'll see that we have used both. In the end, we chose the formula for this edition because we wanted to keep things simple. This is a chapter that just introduces epidemiology, and the simpler formula is sufficient for our purposes. For your purposes, you could use whichever you prefer. I think that professors are able to say they disagree with something in a text and teach students other content.
I hope this explanation helps. If you have any more questions, please let me know.
Purpose
To identify subgroups in populations that may have the highest risk for a specific disease or outcome
Used to:
Find clues about potential causes of disease so that nurses can generate hypotheses about the relationship between exposures and diseases or other health-related outcomes
Measure disease frequency by person, place, and time to determine why disease occurs more frequently under certain conditions
Person
See Box 9-2
Example: the average life expectancy in the United States for 2016
All races and both sexes = 78.6 years
Hispanic females = 84.2 years
Non-Hispanic White females = 81.0 years
Non-Hispanic Black females = 77.9 years
Place
See Box 9-3
Example: obesity in the United States
The frequency of obesity can be examined by state to identify pattens (= spatial clustering)
The prevalence of adult obesity is greater than 35% in the South and in some Midwestern states
Time
3 types of trends
1. Secular
= Trends over years
Example: the global prevalence of diabetes in adults over 18 years of age
4.7% in 1980 --> 8.5% in 2014
2. Cyclical
= Seasonal trends
Example: influenza
See Figure on right: During the 2019-2020 influenza season, positive influenza test results peaked during the 6th week of 2020
3. Short-term changes
= Brief, unexpected changes in disease distribution
3 key terms
Endemic
The expected occurrence of a particular disease within a community or population
Epidemic
A widespread occurrence of a disease in a community or population that is in excess of what is expected
Pandemic
An epidemic that has spread worldwide
Example: rubella in Japan
Endemic until the early 2000s
The number of reported rubella cases remained at record low levels until 2010 (n = 87)
A few outbreaks reported in the workplace among adult males in 2011 (n = 378)
The number of rubella cases sharply increased in 2012 (n = 2,392)
A total of 5,442 rubella cases reported from 01/01/2013 to 05/01/2013
Health officials deemed that Japan was experiencing a nationwide rubella epidemic
Example: Coronavirus
Descriptive study designs are hypothesis-generating and are used to examine different types of phenomena by using three study designs: case reports or series, ecologic studies, and cross-sectional studies:
1. Case Series Studies
= Used to describe rare diseases or outcomes
Purpose: to describe new diseases, explain a change in disease patterns, or alert the healthcare community to unusual signs and symptoms in:
An individual patient (case report)
Rare findings among a few patients (case series)
Example: The first official report of what later became known as the AIDS epidemic in the Morbidity and Mortality Weekly Report
In the period October 1980 - May 1981, 5 young men, all active homosexuals, were treated for biopsy-confirmed Pneumocystis carinii pneumonia at 3 different hospitals in Los Angeles, California. Two of the patients died. All 5 patients had laboratory-confirmed previous or current cytomegalovirus (CMV) infection and candidal mucosal infection... The diagnosis of Pneumocystis pneumonia was confirmed for all 5 patients antemortem by closed or open lung biopsy. The patients did not know each other and had no known contacts or knowledge of sexual partners who had had similar illnesses. Two of the 5 reported having frequent homosexual contacts with various partners. All 5 reported using inhalant drugs, and 1 reported parenteral drug abuse. Three patients had profoundly depressed in vitro proliferative responses to mitogens and antigens. Lymphocyte studies were not performed on the other 2 patients.
Advantage
Describe unusual signs and symptoms so that healthcare providers may be able to identify commonalities among patients when a new disease appears in a population
Helpful in identifying when a current disease mutates
Disadvantage
The lack of a comparison group
Without a comparison group, there is no mechanism to test hypotheses
2. Ecologic Studies
= Correlational studies that are population-based rather than individual-based
To compare a summary measure of disease frequency across summary measures of exposure
Exposure
= Contact with a disease or disease-producing agent
Aggregate data
Data collected from individuals who are grouped to represent a population
Example: nurse administrator has to report salaries for staff nurses
Aggregate the data and report the average salary by unit or by shift instead of telling each individual's salary
Can be used to compare distribution and determinants of diseases across many different populations units (e.g. states, counties, zip codes)
Advantage:
Expedient and relatively inexpensive because they often rely on secondary data
secondary data = data that have already been collected
Can be used to examine a broad range of exposures and diseases
Useful when generating hypotheses
May be used to evaluate the effectiveness of population-level interventions (e.g. immunizations, smoking bans, seat belts)
Disadvantage
Unable to link exposure to disease with specific individuals
Ecologic fallacy
When false assumptions are made about individuals based on aggregated data and associations from populations
See YouTube video on right for more details
Temporal ambiguity
The inability to control for confounding variables and the inability to determine whether the exposure truly occurred before the disease
3. Cross-Sectional Studies
= Nonexperimental design used to gather data from a group of participants at only one point in time; epidemiologic study design used to measure exposure and disease as each exists in a population or representative sample at one specific point in time
Prevalence Ratio (PR)
= The measure of association for cross-sectional studies to indicate the prevalence of exposure and disease
Formula to calculate PR
[A/(A+B)] / [C/(C+D)]
See the table on right for a 2 x 2 table
How to interpret PR
PR = 1 means the probability of disease among the exposed and the nonexposed is identical
There is no association between the exposure and the disease
PR > 1 means there is a greater probability of disease among the exposed
There is an association between the exposure and the disease
PR < 1 means there is decreased probability of disease among the exposed
Indicating a protective effect
2 x 2 table, which is also known as contingency table
A = the number of people with both the exposure and disease
B = the number of people with the exposure, but not the disease
C = the number of people with the disease, but not the exposure
D = the number of people with neither the disease nor the exposure
Advantage
Efficient and relatively inexpensive
Can be used to examine a number of different phenomena, including behaviors, symptoms, diseases, and health status
Can be not only a single exposure and disease but can also examine multiple exposures and diseases simultaneously
Disadvantage
Temporal ambiguity
Unable to ascertain if the exposure preceded the disease because data about exposure and disease status are collected at the same time
Unable to determine whether the specific exposure caused the specific diseases
Because prevalence is a function of incidence and duration of time, it's difficult to distinguish determinants of the cause of the disease from determinants of survival with the disease
Example 1:
Texas Department of Health sent a questionnaire to 600 women to inquire about their smoking and preterm labor
Of the 175 cases with preterm, 150 reported they smoked, and 25 reported no smoking
Of the 425 cases without preterm, 150 reported they smoked, and 275 reported no smoking
To calculate PR:
(150/300) / (25/300) = 0.5 / 0.083 = 6.02
Interpretation:
The proportion of women with preterm labor is 6-fold greater if a woman smokes
Example 2:
The Texas Department of Health sent a questionnaire to 1,000 adults to inquire about their blood pressure and exercise patterns
Of the 500 cases with hypertension (HTN), 100 reported they exercise more than three times per week, and 400 reported no exercise
Of the 500 cases without HTN, 300 reported exercising more than three times per week and 200 reported no exercise
To calculate PR:
(100/400) / (400/600) = 0.25/0.66 = 0.38
Interpretation:
The exposure of exercising more than three times per week provided a protective effect for HTN
People who exercised more than three times per week were 62% (1 - 0.38 = 0.62) less likely to have HTN as compared to those people who did not exercise at all
Analytic study designs are hypothesis-testing designs and are used to test the association between exposure and disease. The figure below describes the epidemiological study designs so intuitively (slide credit: Professor Wade):
= A type of retrospective study in which researchers begin with a group of people who already have the disease; studies that compare two groups
those who have a specific condition
those who do not have the condition
Retrospective because individuals are recalling their past
Key terms
Case
= Individuals who have the disease
Often asked to recall exposures up to 1 year prior to diagnosis
Control
= Individuals who do not have the disease of interest but who are at risk for developing the disease
Asked about their history of exposure up to the point of entry into the study
Odds Ratio (OR)
= The statistic reported when epidemiologists conduct a case-control study
Formula
OR = AD / BC
Interpretation
OR = 1 means the probability of disease among the exposed and the nonexposed is identical
There is no association between the exposure and the disease
OR > 1 means there is a greater probability of disease among the exposed
There is an association between the exposure and the disease
OR < 1 means there is decreased probability of disease among the exposed
A protective effect
Advantage
Expedient
Require a small sample size
Relatively inexpensive
Examine rare diseases and situations that involve individuals who have had many exposures
Low risk of attrition
because participants are asked to recall exposures and thus do not leave the study
Disadvantage
Cannot measure incidence
Able to examine only one disease rather than many
Not conducive to measuring rare exposures
Recall bias is a threat because of the retrospective design
Example of OR
The Mississippi Department of Health was interested in examining the relationship between smoking and heart disease
Investigators examined medical records at Jackson County Hospital and invited 500 patients who had experienced an Acute Myocardial Infarction (AMI) to join the case-control study
Asked the cases to recall their exposures 1 year prior to diagnosis
400 of the cases were current smokers
Investigators then contacted 500 people from Jackson County who did not have a history of heart disease but who were similar to the cases in all other demographic areas
Asked the controls to recall their exposures
150 of the controls were smokers
To calculate OR:
(400x350) / (150x100) = 140,000 / 15,000 = 9.33
To interpret:
There is an association between smoking and having an AMI
Individuals who smoked were 9.33 times more likely to have an AMI compared to those who did not smoke
= Studies using two or more groups; epidemiologic designs in which participants are selected based on their exposure to a determinant
" natural experiment"
Example: Nurses' Health Study
Goal = to examine diet and lifestyle risk factors and their associations with cancer, heart disease, and other chronic diseases
Initiated in 1976 with 120,000 female nurses between the ages of 30 and 55 years
Nurses continue to be followed and outcomes have been added
Case-control studies vs. Cohort studies
Case-control studies
Individuals are selected based on the presence or absence of disease
Cohort studies
Individuals are selected based on their exposure
Select a sample group that is representative of the target population, then assign individuals to groups based on their exposure status
Prospective designs
Follow individuals from the time they enroll in the study until they develop the disease
Retrospective designs
Determine a historical exposure and then follow the sample forward to the present time to determine whether the disease is present
Relative Risk (RR)
= The statistics reported by epidemiologists when they conduct a cohort study
Formula
PR = [A/(A+B)] / [C/C+D)]
Interpretation
PR = 1 means the probability of disease among the exposed and the nonexposed is identical
There is no association
PR > 1 means there is a greater probability of disease among the exposed
There is an association
PR < 1 means there is a decreased probability of disease among the exposed
A protective effect
Advantages
Measure incidence and study many outcomes
Large sample size --> a good design to study rare exposures and can readily establish that an exposure preceded the disease
Less vulnerable to recall bias
Disadvantages
Tend to be expensive because of the need for a large sample size and their longitudinal nature
Require a significant amount of time depending on the disease of interest
Impractical for rare outcomes
Thread of mortality because participants may drop out, move, or even die during the course of hte study
Exposures during the course of a cohort may change
Example of RR
See the example for OR by the Mississippi Department of Health
Case-control design: Sought individuals who had AMIs
Cohort design: Select individuals based on whether or not they smoke
Medical records at Jackson County Hospital were examined and 1,000 patients were invited to join the cohort study
Of the 1,000 patients, investigators found 550 patients who smoked, and 450 patients who did not have a history of smoking but who were similar to the smokers in all other demographic areas
Followed individuals to see who did and did not have an AMI
Of the 550 patients who smoked, 400 patients developed an AMI, compared to only 100 patients who never smoked and developed an AMI
To calculate PR
PR = (400/550) / (100/450) = 0.727 / 0.222 = 3.27
To interpret
Smokers were 3.27 times more likely to have an AMI compared to nonsmokers
= In epidemiology, a study that has a treatment that can be manipulated by the researcher
The statistical test used for intervention studies is the RR
Advantage
Control over the exposure
Randomization to control for potential confounding variables
Blinding to reduce bias
Disadvantage
Cost
Time-consuming
Participants may become noncompliant or withdraw from the study, which also biases outcomes
Hawthorne effect
This effect occurs when participants in an intervention study alter their behavior because they know that they are being observed
Not all exposures are able to be manipulated for ethical reasons