# Contents

**1. Statistics and Probability Are Not Intuitive 1 **

We Tend to Jump to Conclusions

We Tend to be Overconfident

We See Patterns in Random Data

We Don’t Expect Variability to Depend on Sample Size

We are Fooled by Multiple Comparisons

We Tend to Ignore Alternative Explanations

We Crave Crisp Conclusions, But Statistics Offers Probabilities

Chapter Summary

**2. The Complexities of Probability 6 **

Basics of Probability

Probability as Prediction of Long-Term Frequency Probability as Strength of Belief (Bayes)

The Distinction Between Probability and Statistics Lingo

Common Mistakes

Chapter Summary

**3. From Sample to Population 11 **

Sampling from a Population

How far to Generalize?

Lingo

Common Mistakes

Chapter Summary

**4. Confidence Intervals 15**

Example: Survival of Premature Infants

Example: Polling Voters

Assumptions: Confidence Interval of a Proportion

What Does 95% Confidence Really Mean?

Are You Quantifying the Event You Really Care About?

Interpreting Confidence Intervals in Context

Confidence Intervals for Other Kinds of Data Lingo

Common Mistakes

Q& A

Chapter Summary

**5. Types of Variables 28 **

Continuous Variables

Ordinal Variables

Nominal Variables

Q& A

Chapter Summary

**6. Graphing Variability 32**

Graphing Data to Show Scatter or Distribution

Watch Out for Preprocessed Data

Lingo

Common Mistakes

Q& A

Chapter Summary

**7. Quantifying Variation 40 **

Range

Percentiles

Interquartile Range

Five-Number Summary

Standard Deviation

Coefficient of Variation

Lingo

Common Mistakes

Q& A

Chapter Summary

**8. The Gaussian Distribution 46**

How the Gaussian Distribution Arises

The Meaning of Standard Deviation in a Gaussian

Distribution

What a Sample Drawn from a Gaussian Distribution Really Looks Like

Why the Gaussian Distribution is so Central to Statistical Theory

Lingo

Common Mistakes

Q& A

Chapter Summary

**9. The Lognormal Distribution and Geometric Mean 52 **

Overview

Example: Relaxing Bladders

A Review of Logarithms

The Origin of a Lognormal Distribution

How to Analyze Lognormal Data Geometric Mean

Lingo

Common Mistakes

Q& A

Chapter Summary

**10. Confidence Interval for a Mean 57 **

Interpreting a Confidence Interval for a Mean

What Values Determine the Confidence Interval for a Mean?

The Standard Error of the Mean

Assumptions: Confidence Interval for a Mean

Lingo

Common Mistakes

Q& A

Chapter Summary

**11. Error Bars 63**

The Appearance of Error Bars

How to Interpret Error Bars

Which Kind of Error Bar Should You Plot?

How are Standard Deviation and Standard Error

of the Mean Related to Sample Size?

Lingo

Common Mistakes

Q& A

Chapter Summary

**12. Comparing Groups with Confidence Intervals 70 **

Using Confidence Intervals to Compare Groups

Examples of Confidence Intervals Used to Compare Groups

Assumuptions of Confidence Intervals

Common Mistakes

Q& A

Chapter Summary

**13. Comparing Groups with P Values 78**

Introducing P Values via Coin Flipping

A Rule That Links Confidence Intervals and P Values Revisiting the Examples from Chapter 12

Four Things You Need to Know about P Values Lingo

Common Mistakes

Q& A

Chapter Summary

**14. Statistical Significance and Hypothesis Testing 87 **

Statistical Hypothesis Testing

Revisiting the Examples from Chapters 12 and 13

Analogy: Innocent Until Proven Guilty

Extremely Significant? Borderline Significant? Lingo

Choosing a Significance Level

Common Mistakes

Q& A

Chapter Summary

**15. Interpreting a Result That Is (Or Is Not) Statistically Significant 96**

Interpreting Results That are “Statistically Significant”

Interpreting Results That are “Not Statistically Significant” Five Explanations for “Not Statistically Significant”

Results

Lingo

Common Mistakes

Q& A

Chapter Summary

**16. How Common Are Type I Errors? 103 **

What Is a Type I Error?

How Frequently Do Type I Errors Occur?

The Prior Probability Influences the False Discovery

Rate (A Bit of Bayes) Analogy to Clinical Testing Lingo

Common Mistakes

Q& A

Chapter Summary

**17. Multiple Comparisons 111**

Why Multiple Comparisons are a Problem

A Dramatic Demonstration of the Problem with Multiple Comparisons

Multiple Comparisons in Many Contexts

How to Correct for Multiple Comparisons

Lingo

Common Mistakes

Q& A

Chapter Summary

**18. Statistical Power and Sample Size 119**

Ad Hoc Sequential Sample Size Determination Leads to Misleading Results

The Four Questions

Interpreting a Sample Size Statement

A Calculation or a Negotiation?

An Analogy to Understand Statistical Power

Sample Size and the Margin of Error of the Confidence Interval

Lingo

Common Mistakes

Q& A

Chapter Summary

**19. Commonly Used Statistical Tests 127 **

Assumptions Shared by All Standard Statistical Tests

Comparing a Continuous Variable Measured in Two Groups

Comparing a Continuous Variable Measured in Three or More Groups

Comparing a Binary Variable Assessed in Two Groups

Comparing Survival Curves

Correlation and Regression

Lingo

Chapter Summary

**20. Normality Tests 134 **

Testing for Normality

The Problems with Normality Tests

Alternatives to Assuming a Gaussian Distribution

Lingo

Common Mistakes

Q& A

Chapter Summary

**21. Outliers 138**

How Do Outliers Arise?

The Need for Outlier Tests

Five Questions to Ask Before Testing for Outliers

The Question That an Outlier Test Answers

Is It Legitimate to Remove Outliers?

Lingo

Common Mistakes

Q& A

Chapter Summary

**22. Correlation 144**

Introducing the Correlation Coefficient

Assumptions: Correlation

Lingo

Common Mistakes

Q& A

Chapter Summary

**23. Simple Linear Regression 152**

The Goals of Linear Regression

Linear Regression Results

Assumptions: Linear Regression

Comparison of Linear Regression and Correlation

Lingo

Common Mistakes

Q& A

Chapter Summary

**24. Nonlinear, Multiple, and Logistic Regression 163 **

Nonlinear Regression

Multiple and Logistic Regression

Lingo

Common Mistakes

Q& A

Chapter Summary

**25. Common Mistakes to Avoid When Interpreting Published Statistics 167**

Mistake: Not Recognizing Publication Bias

Mistake: Testing Hypotheses Suggested by the Data

Mistake: Making a Conclusion about Causation

When the Data Only Show Correlation

Mistake: Over Interpreting Studies That Measure a Proxy or Surrogate Outcome

Mistake: Over Interpreting Data from an Observational Study

Mistake: Being Fooled by Regression to the Mean

**26. Review 173**

The Fundamental Ideas of Statistics

Statistical Vocabulary by Chapter