Understanding Statistical Arguments

Understanding Statistical Arguments

We’ve learned that anecdotal evidence, or stories that we hear from friends and other sources, are not a reliable form of evidence when we are trying to form many types of reasonable conclusions. If a few friends tell me that a particular phone or a brand of tire is a good one, since my sampling of data is so limited, it could easily happen that they are recommending one of the worst products or there are other problems with the choice that would not be revealed with such a small body of evidence.

We’ve also learned about the virtues of the scientific method for forming reasonable and better justified conclusions about what is real and true. One of most important forms of scientific reasoning, and the foundation of causal reasoning, is the statistical argument. We often need to know how prevalent a property is in a population. Perhaps I want to know how often AT& T phones break down, or how many oak trees in California are diseased or what the income distributions of people in the U.S. are. The best way to find out these facts and many other like them that we must know to understand the world is through a well constructed statistical argument.

Consider this example:

Researchers wanted to know whether 3-D movies cause motion sickness or headaches in a significant number of people who watch them. In ten major cities, at randomly selected movie theaters that were showing 3-D movies, they interviewed people after viewings. Of the 1,253 people they spoke to, 371 people, or about 30%, reported experiencing some discomfort, motion sickness, or headache during the movie. On those grounds, they concluded that 30% of the people who see 3-D movies experience some physical discomfort from them.

In this case, the larger group of things that the study is interested in is people. That is, the target population in this case is people. It’s not possible to study all people, however, so a sample population is used. A sample population is the smaller group of things that are actually studied and that are used to generalize to the target population.

The property that this study focuses on is “experience some physical discomfort from watching a 3-D movie.” That’s the property in the target population that is the central question of the study. This is the target property. But in this case, as with many others, actually measuring the presence or absence of “experiences some physical discomfort from watching a 3-D movie” can be difficult. The property that a statistical investigation actually measures in the sample population and that the study will take as an indicator of the target property in the target population is the measured property. We can suppose that the researchers asked exiting movie goers if they experienced any discomfort from the movie and the movie goers either responded affirmatively or negatively. (We don’t have much more detail in this short example.) So the measured property in this study is “said they experienced some physical discomfort from watching a 3-D movie.” And the researchers take the presence of this property to be an accurate indicator of the presence of the target property. That is, if a movie goers says that the movie gave her some physical discomfort, then she actually did feel some physical discomfort.

This difference may not seem significant in this case, but there are many instances where the difference between the measured property and the target property is important and can make the difference between a study we should accept and one that we should reject. Suppose that the researchers had asked the exiting movie goers “are you having fun on your evening out with the person you saw the movie with” and then took their answers to be an accurate indicator of whether they actually had fun with that person. If the question and answer were asked out loud, in front of the other person, then we would not expect to get an accurate answer. How a question is asked or how a property is measured can be very important. The conceptual issue here is known as accuracy. For any statistical argument, one of the major issues is how it can answer this question: how accurate is the measured property as an indicator of the target property. Here are some other examples of measured property/target property pairs:

report on a census form having a household income of less than $200k a year/have a household income of less than $200k a year;

has brown, withered leaves on more than 30% of the tree/has Dutch Elm disease;

answered yes to the question “have you ever cheated on your taxes”/cheated on their taxes

reported agreeing with the statement, “The evolution of life on Earth was assisted by God”/believes that evolution on Earth was assisted by God.

When a study is about what people believe, some of the methods that can help improve accuracy are anonymous questionnaires, carefully worded questions, diverse and multiple cross referenced questions, research into different measurement techniques, and so on.

The other major issue for a statistical argument involves something known as representativeness. If the sample population is going to be used to generalize and project onto the target population, then the sample population must be representative. That is, the sample population must be composed so that in ways relevant to the possession of the target property, the sample population resembles the target population. Roughly, we want the sample population to look enough like the target population that if we observe the presence of some property in them, we can reasonably infer that it will be present in the target population. If I am wondering about the extent to which Dutch Elms disease has infected American Elm trees, it would not be representative to only look at one small grove of them that are highly infected in a park in Pennsylvania. The trees in that park are probably not representative of all the trees in the country. So generalizing from them is likely to give me a skewed view of the bigger picture.

In one 1994 study, researchers set out to test whether drinking alcohol on an empty stomach makes you drunker than drinking with a full stomach. In their tests, the had 10 people consume alcohol on different days—once with a full stomach, and once empty. They concluded that if you have an empty stomach you will get drunker faster than if you have just eaten.

What’s troubling about the study is that the sample population is a mere 10 people, and the results from those 10 are used to generalize about all people. With only 10 subjects, it is possible that they found an isolated effect that was pronounced in a few people in their study, but that is not widely present in the population at large. The conclusion of the study may be correct, but the very small sample population raises serious questions about its representativeness. In general, the way to improve representativeness is to make the sample population large. And if the sample population is being composed from the target population, a method to improve representativeness is to use a random sampling method. A random sampling method will give every individual in the target population an equal chance of begin chosen for the sample population. If a survey about American political attitudes is conducted and only Republicans are called from a Republican voter registration list, then Democrats and third party voters do not have any chance of being selected. So the study would not be representative. In the study above, movie goers were chosen from 10 different cities and the theaters were chosen at random. Those two facts improve representativeness more, say, than a casual conversation with a few friends about their experience in 3-D movies.

So a statistical argument will have these eight elements. It will be valuable to be able to pick them out:

Sample population: the group of objects that the study actually measures.

Target population: the larger or largest group of objects that the study seeks to draw a conclusion about.

Measured property: the property or feature in the sample population that is actually measured in the study in question.

Target property: the property or feature in the target population that is central to the overall argument.

Accuracy: in order to be strong, the measured property in a statistical argument must be an accurate indicator of the presence of the target property in the target population.

Representativeness: in order to be strong, the sample population must represent or resemble the target population, with regard to other properties that are connected to the property in question. If the sample is going to be used to generalize about the target population, the sample must look like the target in the relevant ways.

Margin of error: is the extent to which researchers believe the presence of the target property in the target population may vary from the presence of the measured property in the sample population. In the study above, they might conclude that 30% of Americans, plus or minus 3%, are made physically uncomfortable from watching 3-D movies. That means that they expect that the actual rate of the target property in the target population may be as high as 33% and low as 27%. Usually, increasing the sample population, or composing it very carefully brings the margin of error down.

Random sampling: if a sample population is randomly sampled from the target population, that means that every member of the target population had an equal chance of being chosen for the sample.

Here’s an example of how the movie survey above might be reconstructed as a strong, deductive argument:

1. Measured property in sample population: 30% of the 1,253 people interviewed in the 10 city survey claimed that watching a 3-D movie gave them physical discomfort. [EP]

2. Measured property in sample population to target property in sample population: If 30% of the 1,253 people interviewed in the 10 city survey claimed that watching a 3-D movie made gave them physical discomfort, then 30% of the 1,253 people interviewed in the 10 city survey were given physical discomfort from watching a 3-D movie. [IP]

3. Preliminary conclusion about the sample: 30% of the 1,253 people interviewed in the 10 city survey claimed that watching a 3-D movie were given physical discomfort from watching a 3-D movie. [1,2]

4. Target property in the sample to target property in the target population: If 30% of the 1,253 people interviewed in the 10 city survey claimed that watching a 3-D movie were given physical discomfort from watching a 3-D movie, then 30% of the people who see 3-D movies experience some physical discomfort from them. [IP]

5. Conclusion about the target population: 30% of the people who see 3-D movies experience some physical discomfort from them. [3,4]

Practice Examples:

1. Lung Cancer Survival

In a broad survey of the medical records of more than 10,000 cancer patients across the country, researchers at Johns Hopkins have concluded that 82% of stage 0 lung cancer patients whose cancer is detected early survive. Over the course of several years, the records of 10,452 patients were carefully reviewed to determine at what stage the cancer was detected, how it was treated, and what the survival rates were. “Early detection” was defined as cancer that was confined to a small area of one lung. “Survival” was defined to mean surviving 5 or more years after diagnosis.

Sample population: 10,452 patients in the Johns Hopkins study

Measured property: had medical files that indicated survival from lung cancer 5 or more years.

Target population: people with lung cancer that is detected early

Target property: survived five or more years.

Measured Property in the Sample Population: 82% of the 10,452 patients in the Johns Hopkins study had medical files that indicated survival from lung cancer 5 or more years.

Target property in the Target Population: 82% of people with lung cancer that is detected early survive for five or more years.

2. Americans and Smoking

In a recent Gallup poll, it was concluded that 21% of Americans smoke. Interviews with more than 75,000 individuals across the United States indicated that about 16,000 of them smoke at least one cigarette or cigar a day. Gallup interviewed no fewer than 1,000 U.S. adults nationwide each day during 2008. These results are based on 75,073 surveys conducted from Jan. 2, 2008, to March 17, 2008. For results based on this sample, the maximum margin of sampling error is ±1 percentage point. In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.

Sample population: _____________________________________________________

Measured property:_____________________________________________________

Target population: ______________________________________________________

Target property: ________________________________________________________

Measured Property in the Sample Population: ________________________________

Target property in the Target Population: ___________________________________

3. California Oak Trees and Sudden Oak Death

A strange malady that quickly turns the leaves of California Oak trees brown and then kills the tree has been sweeping through northern California. In preliminary results of a study on Sudden Oak Death, that is caused by the pathogen Phytophthora ramorum, 26% of oak trees in in four study locations in Marin and Sonoma counties had the disease. Observations were done by people in the field looking for brown or dead leaves, and “bleeding” of viscous sap from intact bark.

Sample population: _____________________________________________________

Measured property:_____________________________________________________

Target population: ______________________________________________________

Target property: ________________________________________________________

Measured Property in the Sample Population: ________________________________

Target property in the Target Population: ___________________________________