Understanding Statistical Arguments
We’ve learned that anecdotal evidence, or stories that we hear from friends and other sources, are not a reliable form of evidence when we are trying to form many types of reasonable conclusions. If a few friends tell me that a particular phone or a brand of tire is a good one, since my sampling of data is so limited, it could easily happen that they are recommending one of the worst products or there are other problems with the choice that would not be revealed with such a small body of evidence.
We’ve also learned about the virtues of the scientific method for forming reasonable and better justified conclusions about what is real and true. One of most important forms of scientific reasoning, and the foundation of causal reasoning, is the statistical argument.
A good statistical argument is the way to move from the anecdote to some rational grounds for believing in a larger, objective phenomena. Suppose Smith sees a case of a teenager who is poor at reading. It comports well with Smith’s view that teenagers are on their phones too much and that literacy is dropping. But an anecdote isn’t enough evidence. A statistical argument, however, could tell us about the real rate of literacy in the population, whether it’s going up, going down, or it’s stable. For all Smith knows, teen literacy is going up, and the case he saw is one of the minority. That is, Smith’s evidence is actually consistent with several different probabilities.
Some Basics of Statistical Arguments
Standard Form for a Statistical Claim will be a sentence of the form:
X% of the target population has the target property.
A target population is the group of things we are interested in.
The target property is the condition or feature that some percentage of them have.
Consider these examples:
32% of high school seniors are not literate.
48% of Americans believe in the power of psychic or spiritual healing.
90% of lung cancer cases occur in people with a history of smoking
35% of cancers are preventable through known lifestyle changes
70% of African elephants live outside formally protected areas
99% of Earth’s atmosphere is composed of just two elements: nitrogen and oxygen
50% of all stars in the Milky Way are in binary or multiple-star systems
Our goal here will be to 1) understand statistical arguments and what separates a strong one from a weak one, and 2) increase our scientific literacy so that when we encounter statistical reasoning in news reporting, science accounts, social media, diet and exercise information, and other cases, we can recognize the elements of the arguments.
Elements of a Statistical Argument
How would we determine the rate of high school seniors who aren’t literate? The problem is that very often the target population is too big for us to conduct a careful study of all of the members. There are 3.5 million high school senior in the U.S. There are half a million African elephants living in the wild. And there are about 340 million Americans. We don’t have the time or the resources to investigate every member of the target population. Researchers conducting investigations have limits on time, funding, and staffing. They need to use a smaller population, a sample population, that is tractable for research, that is representative of the larger target population, and that they can extrapolate conclusions from. To find out about literacy rates among 3.5 million American high school seniors, beliefs in psychic healing, or African elephants, look at a smaller group.
The sample population in a statistical argument is the smaller group of things that the research was actually conducted on, and that was used to extrapolate to the the target population that the statistical conclusion is about.
This tool is designed to help you master the 8 elements of statistical arguments and the template for a valid statistical argument.
Go to the tutor and ask it to quiz you on the elements of statistical arguments. It will give you examples like the ones in this chapter and have you answers questions about the 8 elements. It will give you feedback and explanations.
To get the full benefit:
Practice until you are consistently correct with the questions about the examples.
Aim for at least ~80% correctness across multiple sessions.
If you can do that, you have the skills the course is testing.
There are two major conceptual questions in arguing from the sample population to the target population in cases like this.
Representation. Is the sample population representative of the target population? The issue surrounding representation isn’t just about whether the group was "big enough"; they are about whether the sample is a miniature, honest mirror of the whole. If the mirror is warped, the argument fails. If the sample is representative then what we discover to be true of it will also be true of the target population to some level of precision. If the composition of the sample population strays from the composition of the target population in significant ways, then the sample won’t represent the target. Suppose we only checked the literacy rates of rural, poor high school students, or suppose the sample population was primarily from affluent, private high schools; what is true of those samples won’t represent American high school students well. To get a good result, we would want to consider who might be missing from the sample. If, however, the sample population has balance for race/ethnicity so that the rates of White, Black, Hispanic, Asian, and Native American in the sample are roughly the same as those rates in the national high school senior population, that would help representation. The same is true for socioeconomic status, school type (public, private, Catholic…), and urban vs rural schools. If the sample population captures those demographics roughly in proportion to their presence in the national sample, then it will be representative.
There are two ways that we will consider to achieve representativeness. Random sampling of the target population is the first. Random sampling methods insure that every member of the target population has an equal chance of being selected for the sample population. If when we are selecting students from American high school seniors, no one or no group is left out from our selection process, and the chance of selecting one isn’t higher or lower than any other, then it improves the representation of the sample. If we omitted private school students, or didn’t select any students from rural high schools, for example, then those groups would fail to be in the sample, and we would trust the sample less to inform us about the whole. Stratified sampling is the other common method for composing a sample population. Instead of relying on blind randomness, this approach deliberately chooses members if different key demographics in the target population to insure that the sample is representative.
Accuracy: Representation concerns the composition of the sample compared to the nature of the target population. The second conceptual question for statistical arguments concerns the accuracy of what we actually measured. Is what we measured in the sample population an accurate measure of the property in the target population? The accuracy issue is more subtle. In this case, we must distinguish between literacy and how we measured literacy, which, in the NAEP study we’ve been considering, was a score on a battery of questions about literary and informational texts. On a 500 point scale, students who scored below 268 were considered to be “below basic literacy.” That is, we took “score below 268 on the NAEP test” to be an accurate measurement of “are not literate.” Whatever measurement device we use to check for a property in a population, there is always a question about accuracy because measurements are never perfect. Literacy tests cannot capture every nuance of reading comprehension, pregnancy tests have error rates, Covid tests are administered poorly, questionnaires can be biased, and researchers can make mistakes. Ask people about socially desirable properties like “are you a good driver,” or “are you above average in intelligence,” and their answers will skew in the positive direction giving an inaccurate picture of the property we are interested in. So the accuracy issue concerns the measurement tool, while the representation issue concerns the composition of the sample population compared to the target population. The two issues should not be confused.
8 Elements of a Statistical Argument.
Sample population: 125,000 high school seniors in the NAEP study.
Target population: American high school seniors
Measured property: scored below a 268 out of 500 on the NAEP test
Target property: are not literate.
Accuracy: The NAEP test is an accurate measure of literacy. Details about the accuracy of the test will typically be included in the study’s reporting, analysis, and explanation of methods. Often when study results like these are reported in the media, the details are left out. But reputable researchers will address it with well designed work.
Representativeness: The 125,000 high school seniors in the NAEP study are representative of American high school seniors. Details about how the sample population was compiled will typically be included in the study’s explanation of how the study was conducted. Media reports may omit these details, but careful researchers and organizations will be aware and take efforts to address representativeness.
Random sampling: The report or study will typically report their methods in the published article.
Margin of error: +/- 3% We have stipulated this as the margin of error in our example. Researchers will often report these details in the study.
Template for a Statistical Argument
These issues and concepts can be captured in a template for a statistical argument that is deductively valid:
Preliminary conclusion:
1. X% of the sample population has the measured property. (EP or IP)
Accuracy premise:
2. If X% of the sample population has the measured property, then X% of the sample population has the target property. (EP or IP)
____________________________
Measurement Conclusion:
3. Therefore, X% of the sample population has the target population. (1, 2, MP)
Representativeness Premise:
4. If X% of the sample population has the target property, then X% of the target population has the target property. (EP or IP)
_____________________________
Statistical Conclusion:
5. Therefore, X % of the target population has the target property. (3, 4, MP)
Comments: First, this template is deductively valid. It makes two modus ponens inferences to give a valid conclusion for a statistical argument. Second, this template captures the measurement/accuracy issue in premise 2: it asserts, in effect, that what was measured was an accurate indicator of the target property. Every statistical argument, either implicitly (IP) or explicitly (EP) will need to provide some evidence that the polling, the questionnaire, the blood test, the survey, the literacy test, or whatever method was used gives us grounds to believe that the target property is present in the sample population. Third, this template separate the representation issue into premise 4: it says that the sample population, the group of things that were actually studied, represents or mirrors or is a good proxy stand in for the target population. As we saw, this might be achieved with a random sampling method or deliberate sample composition.
Here is the example about literacy rates from above put into the template form:
32% of high school seniors in the NAEP study scored below a 268 on a 500 point scale. (IP)
If 32% of high school seniors in the NAEP study scored below a 268 on a 500 point scale, then 32% of high school seniors in the NAEP study are not literate. (IP)
_______________________
Therefore, 32% of high school seniors in the NAEP are not literate. (1, 2, MP)
If 32% of high school seniors in the NAEP are not literate, then 32% of American high school seniors are not literate. (IP)
_____________________
Therefore, 32% of high school seniors are not literate. (3,4, MP)
Expectations: Being scientifically literate and being able to decide whether it’s rational to accept the results of research purporting to support a statistical conclusion will require first, being able to correctly identify the 8 elements of a statistical argument as they are listed above, and second, being able to put these arguments into the template for statistical arguments.
Here are some more examples.
Consider this media report about American attitudes about legalizing marijuana:
From the headline, the text, and the diagram, we can answer this question: What is the statistical conclusion, in standard form, where that is stated as: X % of a target population has a target property.
57% of Americans favor legalizing marijuana for medical and recreational use.
The conclusion allows us to answer more questions: What’s the target population in this study?
Americans
What’s the target property?
favor marijuana for medical or recreational use.
What is the sample population? What is the actual group that was studied?
5,140 adults in the PEW study
What is the measured property? “Favor” is difficult to measure. What was measured and taken to be the indicator of favoring?
Answered yes to “should marijuana be legal for medical and recreational use?”
What is the measured statistical claim? Put in the form: X% of the sample population has the measured property.
57% of the 3,581 people in the PEW survey answered yes to “should marijuana be legal for medical and recreational use?”
Is this study accurate? One concern we have in a statistical argument is whether the measured property is an accurate indicator of the target property. The argument, to be strong, should establish that the property they measured in the sample population does indicate that the target property is present at the same rate in the target population. In this study, the accuracy question is: Is answering “yes” to the question, “should marijuana be legal for medical and recreational use?” an accurate indicator of favoring marijuana for medical or recreational use? Can we use that “yes” answer to stand for actually favoring? That is, are there any reasons to think that people might answer inaccurately to a question like this? Would they say “yes,” when they actually disfavor it, or would they say “no,” when they actually favor it? There is some reason to suspect that people might not be completely forthcoming about their attitudes about something like marijuana use where there is some social and legal stigma attached. There might be people in the study who favor legalizing it but they fear being candid because their real answer might reflect poorly on them. When we are answer the accuracy question, we can look at the information we have, reflect on the measurement tool and the accuracy issues we can think of, and note those as concerns. Ultimately, each of us will have to consider what we know and decide whether to believe the accuracy premise given our background beliefs. If we have concerns about accuracy, this question is the place to voice them for our analyses in this class. But be careful, the accuracy question is not about representation. Concerns about whether the 3,581 people in the study demographically represent Americans do not belong under this analysis. The accuracy issue is separate from the representation issue.
Is this study representative? Another challenge to make statistical arguments strong is representativeness. The sample population must represent the target population so that what’s true of the sample population is, for the most part, true of the target population. With regard to economic class, age, sex, religion, and other factors, with humans, for example, the two groups aren’t seriously different. That way, what we discover to be true about the sample population is more likely to be true of the target population. In this study, the representativeness question is: Are the 5,140 adults in the PEW study representative of Americans?
Again, individual results may vary. Good, reputable polling organizations like PEW are aware of the representation issue and they will typically take measures to insure that it is good.
Sleep Study Example: Consider this fictional example. The critical thinking goal of scientific literacy is to be able to correctly identify the element of a statistical argument and reconstruct it:
Americans Falling Short on Sleep, National Survey Finds
A new nationwide study by the National Sleep Institute suggests that nearly half of American adults are not getting enough sleep. Researchers conducted interviews and digital surveys with 8,000 randomly selected adults from across the United States, chosen to reflect the country’s diversity in region, age, income, and education level. Participants were asked about their typical weekday and weekend sleep routines, caffeine intake, screen time, and work hours.
After compiling the results, the research team reported that 42 percent of respondents said they sleep fewer than seven hours a night, the minimum recommended by most health organizations. The finding aligns with a growing body of evidence linking insufficient sleep to higher rates of obesity, cardiovascular disease, and anxiety. The study’s margin of error was plus or minus two percentage points, meaning the true figure could be slightly higher or lower in the broader population.
Dr. Meena Patel, a sleep medicine specialist at Stanford University and one of the study’s co-authors, noted that self-reported sleep data tend to be reliable when compared with readings from wearable devices, though many people underestimate their sleep time by 15 to 30 minutes. “That means the real number of short sleepers might be a bit smaller than what people say,” she explained, “but the overall trend is clear—Americans are chronically tired.”
To improve accuracy, the researchers conducted follow-up phone interviews with a smaller subsample of 500 respondents who also wore sleep-tracking wristbands for two weeks. The results were consistent with the survey findings, lending confidence to the overall estimate.
Dr. Patel emphasized the importance of random sampling and national coverage for studies like this: “We didn’t just survey college students or people who already visit sleep clinics. By reaching people from every region and background, we can generalize our findings to the U.S. adult population with reasonable confidence.”
Elements of this statistical argument:
Conclusion: X% of the target population has the target property
42% of American adults are not sleeping enough.
Sample population? What things were actually studied?
8,000 adults in the NSI study
Target population? What things is the conclusion about?
American adults
Measured property? What property was actually measured in the study?
said they sleep fewer than seven hours a night
Target property? Property in the conclusion?
Are not sleeping enough.
Accuracy? Is what they measured accurate for the target prop?
Is “said they sleep fewer than seven hours a night,” an accurate measurement of “not sleeping enough.”? Your answer may vary. People are probably prone to exaggerate how little sleep they get for sympathy, from availability bias, from bad recall… but sleep loss is a real problem and this seems plausible. The study says, “Participants were asked about their typical weekday and weekend sleep routines, caffeine intake, screen time, and work hours.” These questions probably prompted reflection and better answers overall.
Representativeness? Is the sample pop representative?
Do the 8,000 people in the NSI study represent Americans? The piece says that they were randomly selected adults from across the United States, chosen to reflect the country’s diversity in region, age, income, and education level. These are good efforts to improve representativeness.
Margin of error? No information about margin of error is given.
Random sampling? The subjects were randomly sampled.
Belief in God Example:
Consider this front page report from a Gallup study:
Elements of this statistical argument:
Conclusion: X% of the target population has the target property
81% of Americans believe in God.
Sample population? What things were actually studied?
People in Gallup poll. No more information about them is given.
Target population? What things is the conclusion about?
Americans
Measured property? What property was actually measured in the study?
Reported believing in God.
Target property? Property in the conclusion?
Believe in God.
Accuracy? Is what they measured accurate for the target prop?
Was reporting believing in God an accurate indicator of believing in God? Maybe, but many people believe in belief and over report. That is, believing in God is a social desirable and commended trait. There is stigma on saying you don’t believe. So people might be prone to say they do when they don’t. This will tend to make simply asking inaccurate. The real number might be lower.
Representativeness? Is the sample pop representative?
Were the people in the Gallup Poll representative of people in the U.S.? Unclear, probably. Gallup is a reputable polling organization with very good methods.
Margin of error? Not shown
Random sampling? Not shown
Free Speech Example
Conclusion:
47% of students think free speech rights are secure.
Sample population?
1,023 students in Knight-Ipsos poll
Target population?
Students (in America)
Measured property?
Reported free speech rights are secure
Target property?
Think free speech rights are secure
Accuracy?
Was reporting in the study an accurate indicator of what they think? Probably
Representativeness?
Were the students in the study representative of U.S. students? Probably
Margin of error? Not shown
Random sampling? Not shown
Deporting Illegal Immigrants Example
Conclusion: X% of the target population has the target property
32% of U.S. adults say all illegal immigrants should be deported.
Sample population? What things were actually studied?
People polled in PEW study
Target population? What things is the conclusion about?
U.S. adults
Measured property? What property was actually measured in the study?
Say all immigrants in the U.S. illegally should be deported.
Target property? Property in the conclusion?
Believe all immigrants in the U.S. illegally should be deported. While the headline says “say,” presumably PEW is concluding that U.S. adults “believe.”
Accuracy? Is what they measured accurate for the target prop?
What the people polled say is measured for what they believe. There is some social stigma associated with opposing illegal immigration so there may be underreporting.
Representativeness? Is the sample pop representative? Are the people in the study representative of U.S. adults?
There’s no information given, but PEW researchers are generally reputable with their methodologies.
Margin of error? Not given
Random sampling? Not given.
Summary
Strong scientific reasoning requires, among other things, strong statistical arguments.
Strong statistical arguments will contain these 8 elements:
Sample population: the smaller group of things that the research was actually conducted on, and that was used to extrapolate to the the target population that the statistical conclusion is about.
Target population: the population that the conclusion of the argument is about.
Measured property: the property that was actually studied in the sample population.
Target property: the property that we are interested in among the target population. The measured property is taken to be an accurate indicator of it.
Accuracy: strong statistical arguments should provide evidence that the measured property is an accurate indicator of the presence of the target property. Is the measurement accurate?
Representativeness: strong statistical arguments should compile their sample populations so that they are representative of the target population, usually through:
Random sampling: methods that insure that every member of the target population has an equal chance of being selected for the sample population.
Margin of error: is the range above or below of the real rate of the target property in the target population.
Template: Strong statistical arguments can be reconstructed with this template:
Preliminary conclusion:
1. X% of the sample population has the measured property. (EP or IP)
Accuracy premise:
2. If X% of the sample population has the measured property, then X% of the sample population has the target property. (EP or IP)
____________________________
Measurement Conclusion:
3. Therefore, X% of the sample population has the target population. (1, 2, MP)
Representativeness Premise:
4. If X% of the sample population has the target property, then X% of the target population has the target property. (EP or IP)
_____________________________
Statistical Conclusion:
5. Therefore, X % of the target population has the target property. (3, 4, MP)