Correlation

Understanding correlation

Use this great simulation to better understand the concept of correlation and how data affect the regression line.

Pearson Correlation Coefficient

Pearson correlation coefficient is a statistic value that measures linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences.

en.wikipedia.org/wiki/Pearson_correlation_coefficient

Calculating correlation between two continuous variables

Computing correlation can be broken down into two sub-problems:

Testing if there is a statistically significant correlation between two variables
Quantifying the association or ‘goodness of fit’ between the two variables.

Such "goodness of fit between variable pairs" needs to be compared to some universal scale.

Interpretation of correlation: the meaning of the coefficient of determination (r-squared)

How good does the trend line represent the data? The coefficient of determination </b>(r-squared) is an indicator of how well the data points in the graph fit the trend line calculated. This value may be used to assess the the accuracy of the trend line.

Imagine you obtain data with an r-squared value of 0.65... what does this r-squared value actually mean? Interestingly, it means that 6.5% of the variation on Y (IV) can be explained by the variation on X (DV). An r2=100 is a high correlation because all the variation on Y (DV) can be explained by the variation on X (IV), assuming that the controlled variables are properly controlled.

An R-squared value of 0.10 indicates that 10 percent of the variance in Y is predictable from X. In other words, that 10% of the changes in Y (the DV) can be predicted (or explained) by changes in X (the IV).

Interpretation of the regression line equation

The equation of the regression line should be used to state the relationship found between the DV and the IV, according to the data collected in the experiment. A line has a mathematical equation that follows this formula:

Y = Ax + B

A is a coefficient that indicates the slope of the trend line. B is the value of Y when x=0. When continuous variables are used in Scientific experiments, a scatter plot should be constructed. A trend-line can be immediately calculated for the data points shown in the scatter plot and the equation of the trend-line can be shown (in Google Sheets go to Chart editor > Customize > Series and click trendline and then select Label: use equation). The value of A indicates how much changes Y when X changes by 1. This can be very well used in the interpretation of experimental data.

In an equation "Y = Ax + B", the number A indicates the slope of the trend line and the number B is the value of Y when X=0. In your graph, X probably shows the IV, and Y shows the DV. The trend-line equation should be used to analyze the relationship observed between the DV and the IV according to the data collected in the experiment.

DV = #*IV + ## (where # and ## are fixed numbers)

If the equation of an experiment is DV=5*IV+3, according to the equation:

How much changes the DV when the IV increases by 1?

What is the value of the DV when the IV=0? Is that consistent with the results/theory in your experiment?

What is the value of the IV when the DV=0? Is that consistent with the results/theory in your experiment?

If the DV is a percent change, when will there be a 100% change?.

Causality or correlation?

The ideas of correlation and cause are very important in science. A correlation is a statistical link or association between one variable and another. A correlation can be positive or negative and a correlation coefficient can be calculated that will have a value between -1, 0 and +1. A strong correlation (positive or negative) between one factor and another suggests some sort of causal relationship between the two factors but more evidence is usually required before scientists accept the idea of a causal relationship. To establish a causal relationship (one factor causing another) scientists need to have a plausible (credible/logical) scientific mechanism linking the factors. This strengthens the idea that one causes the other, for example smoking and lung cancer. This mechanism can be tested in experiments (adapted from the IB Biology Guide, 2016).

Report abuse