Use this great simulation to better understand the concept of correlation and how data affect the regression line.
Pearson correlation coefficient is a statistic value that measures linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences.
Computing correlation can be broken down into two sub-problems:
Testing if there is a statistically significant correlation between two variables
Quantifying the association or ‘goodness of fit’ between the two variables.
Such "goodness of fit between variable pairs" needs to be compared to some universal scale.
How good does the trend line represent the data? The coefficient of determination </b>(r-squared) is an indicator of how well the data points in the graph fit the trend line calculated. This value may be used to assess the the accuracy of the trend line.
Imagine you obtain data with an r-squared value of 0.65... what does this r-squared value actually mean? Interestingly, it means that 6.5% of the variation on Y (IV) can be explained by the variation on X (DV). An r2=100 is a high correlation because all the variation on Y (DV) can be explained by the variation on X (IV), assuming that the controlled variables are properly controlled.
The equation of the regression line should be used to state the relationship found between the DV and the IV, according to the data collected in the experiment. A line has a mathematical equation that follows this formula:
A is a coefficient that indicates the slope of the trend line. B is the value of Y when x=0. When continuous variables are used in Scientific experiments, a scatter plot should be constructed. A trend-line can be immediately calculated for the data points shown in the scatter plot and the equation of the trend-line can be shown (in Google Sheets go to Chart editor > Customize > Series and click trendline and then select Label: use equation). The value of A indicates how much changes Y when X changes by 1. This can be very well used in the interpretation of experimental data.
How much changes the DV when the IV increases by 1?
What is the value of the DV when the IV=0? Is that consistent with the results/theory in your experiment?
What is the value of the IV when the DV=0? Is that consistent with the results/theory in your experiment?
If the DV is a percent change, when will there be a 100% change?.
The ideas of correlation and cause are very important in science. A correlation is a statistical link or association between one variable and another. A correlation can be positive or negative and a correlation coefficient can be calculated that will have a value between -1, 0 and +1. A strong correlation (positive or negative) between one factor and another suggests some sort of causal relationship between the two factors but more evidence is usually required before scientists accept the idea of a causal relationship. To establish a causal relationship (one factor causing another) scientists need to have a plausible (credible/logical) scientific mechanism linking the factors. This strengthens the idea that one causes the other, for example smoking and lung cancer. This mechanism can be tested in experiments (adapted from the IB Biology Guide, 2016).