Let's say that the number of ice cream sales and the number of deaths by drowning are correlated. What does that mean? Can a correlation between two variables PROVE anything?
READ: Values of Pearson Correlation - to learn more about the different correlations
READ: Correlation Regression and Linearity - for more information on curvilinear correlations
If we took a survey of children’s eating habits and grades in school we would likely find that those who eat a healthy breakfast have higher grades than those who do not. In fact, Cheerios has mentioned research findings in their commercials implying that if you feed your child their cereal, he or she will do better in school. But is that really what the data tells us? The distinction between correlations and causal conclusions is essential, and the following article presents a great explanation:
So in our example, with a statistically significant correlation between two things, like (X) nutritional breakfast and (Y) grades, several things are hypothetically possible:
Changes in X cause changes in Y - Eating a better breakfast does, in fact, cause better grades.
Changes in Y cause changes in X - Learning more causes students to eat a better breakfast.
Both of the above are true and the two variables are reciprocally determined... eating better causes academic improvements AND learning more causes better eating habits.
There is something else, like socioeconomic status (S.E.S.), that influences both breakfast consumption and academic performance. Healthy food tends to be more expensive than junk food, and the families that can afford a lot of healthy food can also afford other resources that help students (books, computers, tutors, etc.). Therefore, the two are technically related, but not because one directly affects the other.
Because we can not be sure which of the three it is by just measuring breakfast and grades, all we can say is that there is a correlation between the two -- we can NOT say that eating a good breakfast will cause better grades (no matter how reasonable that claim might seem).
When we look at two variables that are mathematically correlated but are not related to each other in a meaningful way we call this a spurious correlation. For example, it is true that there is a positive correlation between ice cream sales and the number of people that drown on a given day. However, does it really make sense that ice cream sales are causing people to drown? Could it be that people drowning is causing others to buy more ice cream? The more reasonable explanation is that the relationship is being caused by something else... in this case, temperature. On hotter days more people buy ice cream and more people go swimming. On colder days there is less of each. Thus, the relationship between ice cream sales and drownings is a spurious correlation because one variable is not truly influencing the other. To stress one point: in the case, ice cream sales and drownings are in fact correlated, and you can predict one from knowing the other. The spurious nature of the correlation does not change the mathematical relationship between the two variables... only our understanding of why they are correlated in the first place.
Optional: Check out this site for more examples of spurious correlations.
OPTIONAL: The Tiger Rock
Just for fun... Homer falling victim to a spurious correlation.