10: Model Fitting (Regression)

"You've got to draw the line somewhere." - Unknown.

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." - Von Neumann"Correlation is not causation but it sure is a hint." -Edward Tufte.Lecture outline: how can we develop a statistical model representative of measured data?

1. Correlation

Direction of association

Strength of association

Pearson correlation coefficient

Sample correlation example

Correlation and outliers; Correlation is not causation.

2. Regression

Correlation vs. Regression

Least Squares error criterion (Gauss)

Standard Deviation (SD) line; graph of averages; smoothed graph of averages

Equation of simple linear regression line.

SD line and the regression line.

Example of regression of y on x and of x on y (generally not the same).

Quality of fit: coefficient of determination (r2); RMS error

Primary reference for this lecture:

1. “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 14 and 15 on “Simple Linear Regression Models” and “Other Regression Models”.

2. "Probability and Statistics with Reliability, Queueing and Computer Science Applications", Kishor Trivedi, Chapter 11: Regression, Correlation, and Analysis of Variance.

Secondary references for this lecture:

1. “Measuring Computer Performance: A Practitioner's Guide” by David Lilja; Chapter 8: “Linear Regression Models”

2. “Performance Evaluation of Computer and Communication Systems” by Le Boudec, Chapter 3: “Model Fitting”

3. “Cartoon Guide to Statistics” by Larry Gonick; Chapter 11: “Regression”