Correlation Coefficient
We can see by looking at the graph whether there is a strong or weak correlation between two variables, and whether that correlation is positive or negative. However, there is a mathematical way of working it out, and that is to calculate the correlation coefficient.
The correlation coefficient 'r' is a measure of the strength and direction of the linear association between two quantitative variables.
This is also known as Pearson's Correlation Coefficient, (Karl Pearson 1857-1936) represented by the letter r, and it is a single number which ranges from -1 (strong negative correlation) to +1 (strong positive correlation).
Correlation coefficients which are close to -1 or +1 indicate a strong correlation. Values close to 0 indicate a weak correlation, with 0 itself indicating no correlation at all.
Regression by eye A scatter plot is displayed and you can draw in regression lines by hand. You can then compare your lines to the best least squares fit. You can also try to guess the correlation coefficient, r. (link to www.ruf.rice.edu) (or link) another version andanother
(Java app) Guess the correlation coefficient competition Four scatter plots and 4 correlation coefficients and your task is to match the coefficients to the plots. New plots can be generated and a running score is kept.
Line of best fit
If appropriate a line of best fit can be drawn through the points on a scatter plot.
Linear Regression is the process for fitting the line (least squares regression) Technology easily fits the line of best fit.
How to do this in iNZight or EXCEL
Visually judge the fit of the line to the data (Discuss in context)
Discuss what the linear model represents and what the gradient indicates (Discuss in context)
eg:
"The regression model equation indicates that the energy content increases at a rate of 64.4kj for every 1 gram increase in fat.
The model predicts energy content (kj) = 64.4 x fat content (g) + 545kj"
Key Concepts:
Correlation Coefficient 'r' has no units
It is only designed to measure linear relationship (It is NOT appropriate for curved relationships/models)
Scaling data has no effect on 'r'
The order of the data does not effect 'r'
The order of the variables has no effect on 'r'
Both variables must be quantitative
Correlation coefficient is NOT resistant to outliers (see outliers)
Always plot the data and decide VISUALLY, before rushing into linear model and 'r' calculation!
Examples:
No linear relationship, but there is a relationship!
Reasonable linear relationship, but there is a better non-linear relationship!