06: Summarizing data
"The object of statistics is to discover methods of condensing information concerning large groups of allied facts into brief and compendious expressions suitable for discussions." - Galton.“Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions.” — Stephen Jay Gould.
"If a man stands with his left foot on a hot stove and his right foot in a refrigerator, the statistician would say that, on the average, he’s comfortable." - Walter Heller.Lecture outline: how can we statistically summarize whole data and represent it as a number?
1. Types of data
Categorical or qualitative variables: nominal scale and ordinal scale of measurement.
Quantitative variables: interval and ratio scales of measurement.
Types of statistical techniques: parameteric or non-parameteric.
2. Summarizing the pattern of data:
Distribution: showing the entirety of data (histograms; bar-graphs; density curves);
Shape (symmetric or skewed; light-tailed or heavy-tailed; single or multiple peaks);
Center (mean; arithmetic mean; when is mean not appropriate; mode and median);
Spread (variance, standard deviation, z-score and coefficient of variation).
Outlier (deviations from the usual pattern of data)
Primary reference for this lecture:
“The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 12: “Summarizing Measured Data”
Secondary references for this lecture:
1. “Measuring Computer Performance: A Practitioner's Guide” by David Lilja; Chapter 3: “Average Performance and Variability”
2. “Performance Evaluation of Computer and Communication Systems” by Le Boudec, Chapter 2: “Summarizing Performance Data, Confidence Intervals”
3. “Cartoon Guide to Statistics” by Larry Gonick; Chapter 2: “Data Description”