06: Summarizing data

"The object of statistics is to discover methods of condensing information concerning large groups of allied facts into brief and compendious expressions suitable for discussions." - Galton.“Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions.” — Stephen Jay Gould.

"If a man stands with his left foot on a hot stove and his right foot in a refrigerator, the statistician would say that, on the average, he’s comfortable." - Walter Heller.Lecture outline: how can we statistically summarize whole data and represent it as a number?

1. Types of data

Categorical or qualitative variables: nominal scale and ordinal scale of measurement.

Quantitative variables: interval and ratio scales of measurement.

Types of statistical techniques: parameteric or non-parameteric.

2. Summarizing the pattern of data:

Distribution: showing the entirety of data (histograms; bar-graphs; density curves);

Shape (symmetric or skewed; light-tailed or heavy-tailed; single or multiple peaks);

Center (mean; arithmetic mean; when is mean not appropriate; mode and median);

Spread (variance, standard deviation, z-score and coefficient of variation).

Outlier (deviations from the usual pattern of data)

Primary reference for this lecture:

“The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 12: “Summarizing Measured Data”

Secondary references for this lecture:

1. “Measuring Computer Performance: A Practitioner's Guide” by David Lilja; Chapter 3: “Average Performance and Variability”

2. “Performance Evaluation of Computer and Communication Systems” by Le Boudec, Chapter 2: “Summarizing Performance Data, Confidence Intervals”

3. “Cartoon Guide to Statistics” by Larry Gonick; Chapter 2: “Data Description”