12: Lies, Damned Lies and Statistics

"There are three kinds of lies: lies, damned lies, and statistics." - Mark Twain

"He uses statistics as a drunken man uses lamp-posts—for support rather than illumination." - Andrew Lang

"If you torture the data long enough, it will confess." - Unknown

Lecture outline: Common errors in interpreting and presenting statistics. How to avoid getting fooled by poor statistics

1. Statistics that lie, mislead and distort

Innumeracy:

percentages and ratios; incorrect extrapolation from models; correlation, coincidences and causes

biases:

proofiness prejudice; biases (cognitive, sampling, etc.)

Usage of the wrong statistics:

Inappropriate average used.

Improper graphics:

Missing zero; double y-axis; missing scales/ labels on axes; messing with bin sizes for histograms; broken scales; presenting 1-d data as a 2-d picture; lying with pie charts; chart-junk (unnecessary diverting embellishments).

2. How to talk back to a statistic:

Ascertaining the validity of a statistic.

Primary reference for this lecture:

“The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 10: “The Art of Data Presentation”, and Chapter 11: “Ratio Games”.

Secondary references for this lecture:

1. “The Visual Display of Quantitative Data” by Edward Tufte

2. “Best and Worst Statistical Graphs”, Gallery of Data Visualization by Michael Friendly [link]