Correlations

I. What is a correlation?

When a correlation is present in a population, there are two properties that are connected such that if a member of that population has one property, then it is more likely (for positive correlations) to have the other property, or if it is a negative correlation, if a member of the population has one property, then it is less likely to have the other. Consider this example:

Exercising regularly is negatively correlated with obesity among humans.

This means that if we consider the population of humans and divide them into the ones that exercise regularly and the ones who do not, we are less likely to find obesity present among the exercisers than the non-exercisers.

When we recognize a positive or negative correlation being argued for in a passage, it will help our analysis if we can put them into a regular form for correlation statements:

Property A is pos./neg. correlated with Property B among population P.

Swimming is positively correlated with drowning among humans.

This claim means that if divide humans into the ones who are swimming and the ones who are not, we will find a higher percentage of the swimmers who drown than the non-swimmers.

Notice that "higher" does not mean "Most." "Most" means "greater than 50%." This is not to say that most swimmers drown, obviously. Most do not. But the percentage of them that drown is higher than the percentage of the non-swimmers who drown. The claim also does not mean that people do not drown in other circumstances. There are people who drown in bathtubs and hot tubs too. But the rates of drowning in the swimmer and non-swimmer groups is different.

And it is this difference in rates that is a correlation, and that suggests more investigation. When we find correlations in the world, the most important question we can ask is why? Why are the rates of the things that have property A that also have property B higher than in the non-A group? What is the connection? If the connection is causal because A causes B, B causes A, or some other third factor causes them both, then we want to identify that causal relationship and understand it. Knowing what causes what in the world is at the center of the scientific enterprise/

II. Examples of Correlations.

Here are some correlations. See if you can recognize the two properties being correlated and the population. Think about what they are claiming and what is not being claimed. Do not make any assumption of a causal connection.

1. School children have a higher chance of having asthma if their school is near a hazardous waste site.

2. Babies born to affluent families tend to have higher birth weights.

3. People with efficient public sanitation systems live longer than people without.

4. Being in the NBA is positively correlated with having big shoes among humans.

5. Having a car wreck is positively correlated with using your windshield wipers among humans.

6. People at the doctor’s office are sick at a much higher rate than others.

7. People who listen to rap music are more likely to smoke marijuana.

8. People who smoke pot are more likely to try heroin.

9. People with a college degree are less likely to be religious.

10. White Americans are more likely to be Republican.

11. Black Americans are more likely to be Democrat.

12. In the recent presidential election, former Civil war slave states voted Republican more.

13. New gun buyers are 57 times more likely to commit suicide.

14. Women who buy guns are much more likely to be murdered.


III. Recognizing a correlation

Our goal is to recognize the formal correlation by putting these expressions into a sentence of the form: Property A is positively/negatively correlated with Property B among Population P. In the first case, "School children have a higher chance of having asthma if their school is near a hazardous waste site." we are considering school children, and we are comparing the rates of asthma between the ones who go to school near hazardous waste and the ones who do not. That is, having asthma and going to school near hazardous waste site are positively correlated among school children.

What does 2. mean? "Babies born into affluent families are more likely to have a high birth weight than babies who are not born into affluent families." It means that the percentage of affluent babies who have a high birth weight is higher than the percentage of non-affluent babies who have a high birth weight. Put another way, for any particular baby, the odds of its having a high birth weight are higher if it came from an affluent family than if it is not from an affluent family. Yet another way to put it is, for any particular baby, the odds of its being affluent are higher if it has a high birth weight than if it does not have a high birth weight. And stated as a formal correlation, according to our pattern, it is:

Affluence is positively correlated with high birth weight among babies.

Among the members of population P, the members that have property A are more likely to have property B than the members that do not have property A. To say that there is a positive correlation between two properties means only that if something has one of the properties then it is more likely to have the other property.

We might be tempted to think about or cite the causal connection: We might jump to the conclusion that being rich causes those babies to be heavier. And that might be true. But the claim, "Being affluent causes high birth weight among babies" is a different claim, requiring a different sort of argument. The assertion we are considering here is only the correlation claim and it only states that the percentage or the rate of high birth weights is higher among those babies, not why.

IV. Other Features of Correlations

It should also be noted that correlations are symmetrical. That is, if property A is pos./neg. correlated with B in P, then it is also true that property B is pos/neg correlated with A in P. The sentences are equivalent in meaning with the position of the two properties are switched:

Ice cream consumption is positively correlated with murder among Americans

means the same thing as

Murder is positively correlated with ice cream consumption among Americans.

(Note: For a variety of reasons, murder rates go up during the summer, at the same time that ice cream consumption does.)

Poverty is negatively correlated with health among humans.

means the same thing as

Health is negatively correlated with poverty among humans.

That is, if we were to randomly sample some humans in poverty and some humans who are not impoverished, we would find that the health rates are lower among the impoverished ones. And if we were to sample some healthy people and some unhealthy people, we would find that the healthy people were less likely to be impoverished.

V. Negative Correlations

To say that there is a negative correlation between property A and B means that the objects with A property are less likely to have the B property than the members of the population that do not have the A property. Example 9 above is a negative correlation:

People with a college degree are less likely to be religious.

That is, among people, if we sample the ones with college degrees and the ones without, we will find a lower rate of religiousness among those with college degrees. And conversely, if we consider religious people and non-religious people, we will find that the rate of college degrees to be lower among the religious.

That means that every positive correlation can be expressed as a negative correlation too. Having a college degree is positively correlated with being non-religious among people. Or, being non-religious is positively correlated with having a college degree among people. Notice that the negative correlation has been converted to a positive one by adding a negative; the new correlation adds "non-" to "religious." The conversion is not unlike what you can do in math when you add a negative to convert to a positive: - ( -4) = 4.

VI. Examples converted to Correlation Statements

Here are all of the examples above put into standard form for correlation statements. Think about these and what they are claiming and what is not implied.

1. School children have a higher chance of having asthma if their school is near a hazardous waste site.

Having asthma is positively correlated with going to school near a hazardous waste site among school children.


2. Babies born to affluent families tend to have higher birth weights.

Affluence is positively correlated with high birth weight among babies.

3. People with efficient public sanitation systems live longer than people without.

Longevity is positively correlated with having efficient public sanitation systems among people.

4. Being in the NBA is positively correlated with having big shoes among humans.

Being in the NBA is positively correlated with having big shoes among humans.

5. Having a car wreck is positively correlated with using your windshield wipers among humans.

Having a car wreck is positively correlated with using your windshield wipers among humans.

6. People at the doctor’s office are sick at a much higher rate than others.

Being at the doctor’s office is positively correlated with being sick among people.

7. People who listen to rap music are more likely to smoke marijuana.

Pot smoking is positively correlated with smoking marijuana among people.

8. People who smoke pot are more likely to try heroin.

Smoking pot is positively correlated with trying heroin among people.

9. People with a college degree are less likely to be religious.

Having a college degree is negatively correlated with being religious among people.

10. White Americans are more likely to be Republican.

Being a white is positively correlated with being Republican among Americans

11. Black Americans are more likely to be Democrat.

Being black is positively correlated with being a Democrat among Americans.

12. In the recent presidential election, former Civil war slave states voted Republican more.

Voting Republican is positively correlated with being in a former Civil war state among American states, or Americans

13. New gun buyers are 57 times more likely to commit suicide.

Being a new gun buyer is positively correlated with committing suicide among gun buyers.

14. Women who buy guns are much more likely to be murdered.

Buying guns is positively correlated with being murdered among women.

VII. Correlation and Causation

Sometimes the presence of a correlation indicates causation, sometimes it does not. A very common mistake is to assume that the presence of a correlation indicates the presence of a causal connection. Correlation does not imply causation. Consider these true correlations that violate your assumptions about the causal connection:

1. Faster computers are positively correlated with asthma among humans.

2. The presence of blue jeans is negatively correlated with the presence of unicorns.

3. Consumption of ice cream is positively correlated with murders among humans.

4. Ice cream sales are positively correlated with drowning among humans.

5. Sleeping with one's shoes on is strongly correlated with waking up with a headache.

6. The absence of pirates is positively correlated with global warming.

7. Young children who sleep with the light on are much more likely to develop myopia in later life. (It turns out that parents with myopia are more likely to leave the light on, and they are more likely to have children with myopia.)

When a correlation is discovered, it may exist because their is some direct causal connection between the two properties, but it could be because some other causal factor is causing both of them. Much more argument is needed to infer a causal connection from a correlation. We will discuss those arguments in the next module in the course. In the example from above:

Having a car wreck is positively correlated with using your windshield wipers among humans.

turning your windshield wipers on didn't cause the driver to wreck. It's a third event, the rain, that leads to or causes both wrecks and windshield wiper use. Likewise, ice cream sales go up in the summer when it's hot, and drownings go up in the summer when it's hot, but ice cream and drowning aren't directly causally related.

VIII. In general, it is a mistake to reason merely from the presence of a correlation to a causal conclusion:

  1. Teenage boys eat lots of chocolate.

  2. Teenage boys have acne.

  3. Therefore, chocolate causes acne.


  1. Ice-cream sales are strongly (and robustly) correlated with crime rates

  2. Therefore, ice-cream causes crime.


  1. Gun ownership is correlated with crime.

  2. Therefore, gun ownership leads to crime.

The Homer Fallacy:

Homer Simpson famously makes the mistake of confusing correlation with causation:

Homer: Not a bear in sight. The "Bear Patrol" must be working like a charm!

Lisa: That's specious reasoning, Dad.

Homer: Thank you, dear.

Lisa: By your logic I could claim that this rock keeps tigers away.

Homer: Oh, how does it work?

Lisa: It doesn't work.

Homer: Uh-huh.

Lisa: It's just a stupid rock. But I don't see any tigers around, do you?

Homer: Lisa, I want to buy your rock.

Some correlations are also causal. In some cases, the reason that property A is correlated with property B is that A causes B. So smoking is positively correlated with cancer, AND smoking causes cancer. But many correlations are not causal like ice consumption and murder, and windshield wiper use and car wrecks. There will be a correlation behind every causal connection in the world, but not all correlations are causal.

There is much more to learn about causal relationships in the next module in the course.

IX. Here are some more examples of correlations and correlation studies.

1. Being a single woman is positively correlated with miscarriage among women.

http://www.newscientist.com/channel/health/dn10717-single-women-may-face-higher-risk-of-miscarriage.html

The researchers say that more studies are needed to establish these links and to explore the following additional findings of their study:

• The widely-help belief that morning sickness is the sign that the pregnancy is progressing well was supported – nausea and sickness were linked to a 70% reduced risk of miscarriage.

• Daily consumption of chocolate reduced the chances of miscarriage by a modest 20%.

• The risk of miscarriage did not vary by social class or employment status, contrary to the findings of previous research.

• The results confirmed that older women and those with a history of fertility problems have a higher chance of miscarriage.

• Women who had previously had abortions were more likely to miscarry, contrary to some previous research, which found no such link. Some scientists speculate that abortions might raise the risk of infections that could complicate future pregnancies

2..Water and Health: http://www.nytimes.com/2008/04/29/health/research/29perc.html?_r=1&ref=science&oref=slogin

3. Depression and Alzheimer’s :http://www.nytimes.com/2008/04/29/health/research/29agin.html?adxnnl=1&ref=science&adxnnlx=1209661233-IwXGbqA40QxNFjT4FoO4PQ

4. Gymnastics and injuries: http://www.nytimes.com/2008/04/22/health/research/22haza.html?scp=1&sq=vital+signs&st=nyt

5. Lack of sleep and obesity in infants:

http://www.nytimes.com/2008/04/08/health/08patt.html?scp=12&sq=vital+signs&st=nyt

6. Potbelly and Dementia: http://query.nytimes.com/gst/fullpage.html?res=9501E2DD1E3CF932A35757C0A96E9C8B63&scp=14&sq=vital+signs&st=nyt

7. Dow Corning and Breast Implants

http://www.pbs.org/wgbh/pages/frontline/implants/cron.html