Notes

"It ran 20 percent faster when I cut out the I/O buffering"

"My simulation is accurate to within 10 percent every time"

"This system was chosen because it did 14 percent more work than the next best system"

Statements like these nearly always assume that the value for a performance metric is a single fixed number: If a test is run twice---or ten times---the result will be the same in every case. This assumption---stated more succinctly---is that there exists zero within sample variability. Therefore, the variation between samples (e.g., several runs on the same computer) can be disregarded in comparison with the variation between samples (e.g., runs on computer A compared to runs of computer B). If the standard deviation (a measure of variability) of run times on computers A and B were always far smaller than the difference in run times on the two machines, the within-sample variability would clearly not be significant. Unfortunately, this nice situation does not always obtain, and the analyst is presented with difficulties in many analyses.