Essential Biostatistics

Preface

1. Statistics and Probability Are Not Intuitive 1

We Tend to Jump to Conclusions

We Tend to be Overconfident

We See Patterns in Random Data

We Don’t Expect Variability to Depend on Sample Size

We are Fooled by Multiple Comparisons

We Tend to Ignore Alternative Explanations

We Crave Crisp Conclusions, But Statistics Offers Probabilities

Chapter Summary

2. The Complexities of Probability 6

Basics of Probability

Probability as Prediction of Long-Term Frequency Probability as Strength of Belief (Bayes)

The Distinction Between Probability and Statistics Lingo

Common Mistakes

Chapter Summary

3. From Sample to Population 11

Sampling from a Population

How far to Generalize?

Lingo

Common Mistakes

Chapter Summary

4. Confidence Intervals 15

Example: Survival of Premature Infants

Example: Polling Voters

Assumptions: Confidence Interval of a Proportion

What Does 95% Confidence Really Mean?

Are You Quantifying the Event You Really Care About?

Interpreting Confidence Intervals in Context

Confidence Intervals for Other Kinds of Data Lingo

Common Mistakes

Q& A

Chapter Summary

5. Types of Variables 28

Continuous Variables

Ordinal Variables

Nominal Variables

Q& A

Chapter Summary

6. Graphing Variability 32

Graphing Data to Show Scatter or Distribution

Watch Out for Preprocessed Data

Lingo

Common Mistakes

Q& A

Chapter Summary

7. Quantifying Variation 40

Range

Percentiles

Interquartile Range

Five-Number Summary

Standard Deviation

Coefficient of Variation

Lingo

Common Mistakes

Q& A

Chapter Summary

8. The Gaussian Distribution 46

How the Gaussian Distribution Arises

The Meaning of Standard Deviation in a Gaussian

Distribution

What a Sample Drawn from a Gaussian Distribution Really Looks Like

Why the Gaussian Distribution is so Central to Statistical Theory

Lingo

Common Mistakes

Q& A

Chapter Summary

9. The Lognormal Distribution and Geometric Mean 52

Overview

Example: Relaxing Bladders

A Review of Logarithms

The Origin of a Lognormal Distribution

How to Analyze Lognormal Data Geometric Mean

Lingo

Common Mistakes

Q& A

Chapter Summary

10. Confidence Interval for a Mean 57

Interpreting a Confidence Interval for a Mean

What Values Determine the Confidence Interval for a Mean?

The Standard Error of the Mean

Assumptions: Confidence Interval for a Mean

Lingo

Common Mistakes

Q& A

Chapter Summary

11. Error Bars 63

The Appearance of Error Bars

How to Interpret Error Bars

Which Kind of Error Bar Should You Plot?

How are Standard Deviation and Standard Error

of the Mean Related to Sample Size?

Lingo

Common Mistakes

Q& A

Chapter Summary

12. Comparing Groups with Confidence Intervals 70

Using Confidence Intervals to Compare Groups

Examples of Confidence Intervals Used to Compare Groups

Assumuptions of Confidence Intervals

Common Mistakes

Q& A

Chapter Summary

13. Comparing Groups with P Values 78

Introducing P Values via Coin Flipping

A Rule That Links Confidence Intervals and P Values Revisiting the Examples from Chapter 12

Four Things You Need to Know about P Values Lingo

Common Mistakes

Q& A

Chapter Summary

14. Statistical Significance and Hypothesis Testing 87

Statistical Hypothesis Testing

Revisiting the Examples from Chapters 12 and 13

Analogy: Innocent Until Proven Guilty

Extremely Significant? Borderline Significant? Lingo

Choosing a Significance Level

Common Mistakes

Q& A

Chapter Summary

15. Interpreting a Result That Is (Or Is Not) Statistically Significant 96

Interpreting Results That are “Statistically Significant”

Interpreting Results That are “Not Statistically Significant” Five Explanations for “Not Statistically Significant”

Results

Lingo

Common Mistakes

Q& A

Chapter Summary

16. How Common Are Type I Errors? 103

What Is a Type I Error?

How Frequently Do Type I Errors Occur?

The Prior Probability Influences the False Discovery

Rate (A Bit of Bayes) Analogy to Clinical Testing Lingo

Common Mistakes

Q& A

Chapter Summary

17. Multiple Comparisons 111

Why Multiple Comparisons are a Problem

A Dramatic Demonstration of the Problem with Multiple Comparisons

Multiple Comparisons in Many Contexts

How to Correct for Multiple Comparisons

Lingo

Common Mistakes

Q& A

Chapter Summary

18. Statistical Power and Sample Size 119

Ad Hoc Sequential Sample Size Determination Leads to Misleading Results

The Four Questions

Interpreting a Sample Size Statement

A Calculation or a Negotiation?

An Analogy to Understand Statistical Power

Sample Size and the Margin of Error of the Confidence Interval

Lingo

Common Mistakes

Q& A

Chapter Summary

19. Commonly Used Statistical Tests 127

Assumptions Shared by All Standard Statistical Tests

Comparing a Continuous Variable Measured in Two Groups

Comparing a Continuous Variable Measured in Three or More Groups

Comparing a Binary Variable Assessed in Two Groups

Comparing Survival Curves

Correlation and Regression

Lingo

Chapter Summary

20. Normality Tests 134

Testing for Normality

The Problems with Normality Tests

Alternatives to Assuming a Gaussian Distribution

Lingo

Common Mistakes

Q& A

Chapter Summary

21. Outliers 138

How Do Outliers Arise?

The Need for Outlier Tests

Five Questions to Ask Before Testing for Outliers

The Question That an Outlier Test Answers

Is It Legitimate to Remove Outliers?

Lingo

Common Mistakes

Q& A

Chapter Summary

22. Correlation 144

Introducing the Correlation Coefficient

Assumptions: Correlation

Lingo

Common Mistakes

Q& A

Chapter Summary

23. Simple Linear Regression 152

The Goals of Linear Regression

Linear Regression Results

Assumptions: Linear Regression

Comparison of Linear Regression and Correlation

Lingo

Common Mistakes

Q& A

Chapter Summary

24. Nonlinear, Multiple, and Logistic Regression 163

Nonlinear Regression

Multiple and Logistic Regression

Lingo

Common Mistakes

Q& A

Chapter Summary

25. Common Mistakes to Avoid When Interpreting Published Statistics 167

Mistake: Not Recognizing Publication Bias

Mistake: Testing Hypotheses Suggested by the Data

Mistake: Making a Conclusion about Causation

When the Data Only Show Correlation

Mistake: Over Interpreting Studies That Measure a Proxy or Surrogate Outcome

Mistake: Over Interpreting Data from an Observational Study

Mistake: Being Fooled by Regression to the Mean

26. Review 173

The Fundamental Ideas of Statistics

Statistical Vocabulary by Chapter

Page updated

Report abuse

Contents