Least-SQUARES Regression

Statistical Analyses of Experimental Results

Usually, points that represent experimental data form a sort of shape on the Cartesian plane. That relationship is quantified with the best fit line. That line should pass through as many points as possible. Some of the points are either below or above the best fit line. Computer programs analyze the distances of those points from the best fit line and return information in the language of statistics. In PhET simulation, the reduced chi-squared X² and r² are used to inform about the regression.

Reduced chi-squared X²

X² tells us how close the data points (experimental value) are to the line (expected value), or, how good the best fit line is. The formula is used to calculate X² is presented in Figure 1.

In PhET simulation, it is shown as the first bar in the left hand window. When linear regression is chosen, the X² bar is green (Figure 2 below). When cubic regression is chosen, the bar is red (Figure 3 below). Based on that comparison, the linear regression is better.

Figure 1. Formula for the reduced chi-squared statistics used in PhET simulation. (We should be truly grateful that we do not need to calculate that by hand!)

Figure 2. Linear regression

Figure 3. Cubic regression

R-squared r²

R-squared tells us about the strength of the relationship. Is the chosen best-fit line model a strong model?

In PhET simulation, r² is shown as the second bar in the left hand window. For both linear and cubic models the blue bar is quite high, 0.93 and 0.95, respectively (Figure 2 and 3 above). Based on that information, both regressions seem to return good models. Perhaps more data pints would help to determine which model is better.

Residuals

Residuals are vertical differences between the value read on the best-fit line and the point's second coordinate. Thus, the residuals are calculated as follows: f(x) - y.

In the PhET simulation, the residuals are shown as vertical lines connecting the best-fit line and the data points. The farther the points are from the best fit line, the longer residuals. The shorter the residual lines are, the better model has been chosen. Ideally, the residuals are close to a zero.

Drawing Conclusions Based on the Statistical Analyses

The reduced chi-squared X², r², and the residuals are used to draw conclusions about the best-fit line. The following questions are posed to help you draw some conclusions about your models.

To your judgement, which one is the best fit line?
- In different words, is the linear fit the best one? Or, maybe the quadratic or cubic fit?
How do the chi square and residuals inform your choice?
- Are the points close to or far away from the model line? Is the model good or not? Which one is the best model, linear, quadratic or cubic?
What can you say about the relationship presented on the graph?
- Is r² high or low? Thus, is it a strong relationship? Is it increasing, decreasing, or cannot say?

Page updated

Google Sites

Report abuse