Linear regression aims to find a linear relationship to describe the correlation between two continuous variables.
One of the simplest models is a linear model where the dependent (y) and independent (x) variables are related by a slope (m) and intercept (c): y=mx+c.
A graph of independent (x) and dependent (y) datapoints are plotted onto a scatter graph. The regression is carried out by determining the 'line of best fit' which is a straight line drawn onto the plot.
Most frequenty, a linear regression is based on the method of least squares. The aim is to minimise the sum of the vertical distances between the observed data points and the line of best fit.
In this approach, the values of m and c are solved to minimise the following:
A simple linear regression line, y=mx+c, can be interpreted as follows:
y is the predicted value of the dependent variable
c is the intercept and predicts where the regression line will cross the y-axis
m predicts the change in y for every unit change in x.
The equation of the regression line can be used for finding approximate values for missing data.
When carrying out linear regression, it is important to consider the results in the context of the recorded data. From experimental data, the linear regression is usually considered to only be valid within the bounds of the data, i.e. for values falling between the top and bottom values which have been determined (interpolation). Calculating data beyond these limits is termed 'extrapolation' and can be problematic. For instance, many instruments only have a certain 'linear range', and values calculated beyond these bounds may not behave in a linear manner.