Correlation does not imply causation
Correlation means there is a relationship or pattern between the values of two variables.
A scatterplot displays data about two variables as a set of points in the xy-plane and is a useful tool for determining if there is a correlation between the variables.
If there is a correlation between two variables, a pattern can be seen when the variables are plotted on a scatterplot. If this pattern can be approximated by a line, the correlation is linear. Otherwise, the correlation is non-linear.
There are three ways to describe correlations between variables.
A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, when one variable increases as the other variable increases, or one variable decreases while the other decreases.
An example of positive correlation would be height and weight. Taller people tend to be heavier.
A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other.
An example of negative correlation would be height above sea level and temperature. As you climb the mountain (increase in height) it gets colder (decrease in temperature).
If the change in values of one set doesn't affect the values of the other, then the variables are said to have "no correlation" or "zero correlation."
An example could be the relationship between
Causation means that one event causes another event to occur.
Causation can only be determined from an appropriately designed experiment.
In such experiments, similar groups receive different treatments, and the outcomes of each group are studied.
We can only conclude that a treatment causes an effect if the groups have noticeably different outcomes.
A causal relation between two events exists if the occurrence of the first causes the other. The first event is called the cause and the second event is called the effect. A correlation between two variables does not imply causation. On the other hand, if there is a causal relationship between two variables, they must be correlated.
While causation and correlation can exist simultaneously, correlation does not imply causation.
Causation means one thing causes another—in other words, action A causes outcome B.
On the other hand, correlation is simply a relationship where action A relates to action B—but one event doesn’t necessarily cause the other event to happen.
Even if there is a correlation between two variables, we cannot conclude that one variable causes a change in the other. This relationship could be coincidental, or a third factor may be causing both variables to change.
For example, Liam collected data on the sales of ice cream cones and instances of people getting sunburned in his hometown. He found that when ice cream sales were low, the instances of sunburn tended to be low and that when ice cream sales were high, instances of sunburn tended to be high as well.
So does eating ice-cream cause sunburns, or does getting sunburns cause an increase of ice-cream consumption?
We cannot simply assume causation even if we see two events happening, seemingly together, before our eyes. Why? First, our observations are purely anecdotal. Second, there are several other possibilities for an association, including:
The opposite is true: B actually causes A.
The two are correlated, but there’s more to it: A and B are correlated, but they’re actually caused by C.
There’s another variable involved: A does cause B—as long as D happens.
There is a chain reaction: A causes E, which leads E to cause B (but you only observed that A causes B).
The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them.
Example:
A study shows that there is a negative correlation between a student's anxiety before a test and the student's score on the test. But we cannot say that the anxiety causes a lower score on the test; there could be other reasons - the student may not have studied well, for example. So the correlation here does not imply causation (no causal relationship).
However, consider the positive correlation between the number of hours you spend studying for a test and the grade you get on the test. Here, there is causation as well; if you spend more time studying, it results in a higher grade.
An outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.
Complete the assignment:
#4: Correlation does not imply causation
Two variables can have a Positive correlation:
As x increases, y tends to increase.
Two variables can have a Negative correlation:
As x increases, y tends to decrease.
Two variables can have No correlation:
As x increases:
y tends to stay about the same
y has no clear pattern
Example #1
Vivek notices that students in his class with larger shoe sizes tend to have higher grade point averages. Based on this observation, what is the best description of the relationship between shoe size and grade point average?
a) Causal relationship
b) No correlation
c) Positive correlation
d) Negative correlation
e) Cannot be determined from the information provided
Example #2
A principal collected data on all students at her high school and concluded that there is no correlation between the number of absences and grade point average. Which of the following statements are consistent with the principal's findings?
Choose all answers that apply:
a) The number of absences a student has is not a predictor of their grade point average.
b) Students with fewer absences tend to have higher grade point averages because they are present for more of their academic classes.
c) There is a linear relationship between the number of absences and grade point average.
Example #3
The scatterplot below shows the price of a hot dog and a small drink at seventeen different baseball stadiums.
Based on the scatterplot, which of the following statements is true?
Choose 1 answer:
a) There is a positive linear correlation between the price of hot dogs and soft drinks.
b) There is a negative linear correlation between the price of hot dogs and soft drinks.
c) There is no correlation between price of hot dogs and soft drinks.
d) An increase in the price of hot dogs causes an increase in the price of soft drinks.
e) The ballpark with the most expensive hot dog has the most expensive soft drink.
Example #4
Data from a certain city shows that the size of an individual's home is positively correlated with the individual's life expectancy. Which of the following factors would best explain why this correlation does not necessarily imply that the size of a individual's home is the main cause of increased life expectancy?
Choose 1 answer:
a) Larger homes have more safety features and amenities, which lead to increased life expectancy.
b) The ability to afford a larger home and better healthcare is a direct effect of having more wealth.
c) The citizens were not selected at random for the study.
d) There are more people living in small homes than large homes in the city.
e) Some responses may have been lost during the data collection process.
If there is a correlation between two variables, a pattern will be seen when the variables are plotted on a scatterplot.
There are three ways to describe the correlation between variables.
Positive correlation: As x increases, y increases.
Negative correlation: As x increases, y decreases.
No correlation: As x increases, y stays about the same or has no clear pattern.
Causation can only be determined from an appropriately designed experiment.
Sometimes when two variables are correlated, the relationship is coincidental or a third factor is causing them both to change.