10: Model Fitting (Regression)
"You've got to draw the line somewhere." - Unknown.
"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." - Von Neumann"Correlation is not causation but it sure is a hint." -Edward Tufte.Lecture outline: how can we develop a statistical model representative of measured data?
1. Correlation
Direction of association
Strength of association
Pearson correlation coefficient
Sample correlation example
Correlation and outliers; Correlation is not causation.
2. Regression
Correlation vs. Regression
Least Squares error criterion (Gauss)
Standard Deviation (SD) line; graph of averages; smoothed graph of averages
Equation of simple linear regression line.
SD line and the regression line.
Example of regression of y on x and of x on y (generally not the same).
Quality of fit: coefficient of determination (r2); RMS error
Primary reference for this lecture:
1. “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 14 and 15 on “Simple Linear Regression Models” and “Other Regression Models”.
2. "Probability and Statistics with Reliability, Queueing and Computer Science Applications", Kishor Trivedi, Chapter 11: Regression, Correlation, and Analysis of Variance.
Secondary references for this lecture:
1. “Measuring Computer Performance: A Practitioner's Guide” by David Lilja; Chapter 8: “Linear Regression Models”
2. “Performance Evaluation of Computer and Communication Systems” by Le Boudec, Chapter 3: “Model Fitting”
3. “Cartoon Guide to Statistics” by Larry Gonick; Chapter 11: “Regression”