For some fun introductions to scatter graphs try these activities, the human scatter graphs is a whole class activity.
Previously, we look at correlation and we defined it as strong, moderate and weak based on a view point, however as this is subjective it is difficult. A more precise measure of correlation and the relationship between two variables is using Pearson's product moment correlation coefficient, denoted as r.
The value of r ranges from -1 to 1.
The sign indicates the direction of the correlation:
A positive value for r indicates a positive correlation.
A negative value for r indicates a negative correlation.
The value of 0 indicates there is no correlation.
The size of r indicates the strength of the correlation:
A value close to +1 or -1 indicates a strong relation.
A value close to 0 indicates a weak correlation.
To find the correlation coefficient we can use technology such as google sheets and excel which makes it very easy.
Do you think that if two variables are correlated there is always a relationship based on causation?
Explore the images below and have a discussion.
If two variables have a strong relationship when graphed on a scatter plot, the linear relationship can be approximated by drawing a line of best fit and finding its equation.
A line of best fit:
represents most or all of the points as closely as possible
goes through as many points as possible
has roughly the same number of points above and below it
is drawn so that the distances from all the points to the line are as small as possible.
We can also use the line of best fit to predict what might happen within the data range (interpolation) or outside of the data range (extrapolation). Further explore the line of best fit and scatter plots using the desmos activity below.
Inquiry: Desmos activity
You can use technology of excel and google sheets to fit the line of best fit, otherwise known as the regression line, more accurately. When you do this you can also find the equation of the line and we can then interpret what this means in the context of the question.
Once we have a line of best fit we can use it to predict other values, it we are predicting within the range. of data it is called interpolation, if it is outside the range of data it is extrapolation. Interpolation is much more accurate and predictable than extrapolation.
Select two components of fitness that you wish to see if there is a relationship between. Plot the results using a scatter plot and use technology to find the line of best fit.
Interpret what this infers about the relationship between the variables. Which variable did you choose as the independent vs dependent and why? Can you deduce causation between your variables?
How can you use any determined relationships between variables to inform your fitness plan?
Can you find the equation of the line of best fit and what does this tell you or how could you use this?