By the end of Hour 1, students will be able to:
Explain why conducting multiple hypothesis tests inflates family-wise Type I error rate above α = 0.05, and calculate the probability of at least one false positive across k tests.
Recognize multiple comparison scenarios in published research (subgroup analyses, multiple outcomes, post-hoc pairwise comparisons) and identify whether authors acknowledge or adjust for this issue.
Interpret standardized effect sizes (Cohen's d) and raw effect sizes (mean differences) from published research comparing groups.
Evaluate whether reported effect sizes are clinically/practically meaningful using discipline-specific benchmarks rather than generic Cohen's labels.
Define power as P(reject H₀ | H₀ is false) and explain how it relates to Type II error (β).
Predict how changes in sample size, effect size, alpha, and variability affect power and required sample size for detecting group differences.
Critique retrospective power analyses as circular and uninformative.
By the end of Hour 2, students will be able to:
Analyze examples where identical effect sizes produce different p-values due to sample size (Kim table), demonstrating that significance ≠ importance.
Apply discipline-specific effect size benchmarks to evaluate whether findings represent meaningful magnitudes (e.g., audiology d = 0.25 vs. psychology d = 0.25).
Evaluate published power analyses for unrealistic assumptions (publication-biased effect estimates, optimistic dropout rates) and calculate true detection probability accounting for these inflations.
Construct a sample size consultation plan specifying: clinically meaningful effect size (with literature justification), variance estimates, anticipated dropout, and design constraints.