12: Lies, Damned Lies and Statistics
"There are three kinds of lies: lies, damned lies, and statistics." - Mark Twain
"He uses statistics as a drunken man uses lamp-posts—for support rather than illumination." - Andrew Lang
"If you torture the data long enough, it will confess." - Unknown
Lecture outline: Common errors in interpreting and presenting statistics. How to avoid getting fooled by poor statistics
1. Statistics that lie, mislead and distort
Innumeracy:
percentages and ratios; incorrect extrapolation from models; correlation, coincidences and causes
biases:
proofiness prejudice; biases (cognitive, sampling, etc.)
Usage of the wrong statistics:
Inappropriate average used.
Improper graphics:
Missing zero; double y-axis; missing scales/ labels on axes; messing with bin sizes for histograms; broken scales; presenting 1-d data as a 2-d picture; lying with pie charts; chart-junk (unnecessary diverting embellishments).
2. How to talk back to a statistic:
Ascertaining the validity of a statistic.
Primary reference for this lecture:
“The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 10: “The Art of Data Presentation”, and Chapter 11: “Ratio Games”.
Secondary references for this lecture:
1. “The Visual Display of Quantitative Data” by Edward Tufte
2. “Best and Worst Statistical Graphs”, Gallery of Data Visualization by Michael Friendly [link]