Featured Module (Archived)
(Week of June 2, 2025)
(Week of June 2, 2025)
A new educational offering from the Statistics section of the curriculum wheel has been posted (1-1.5 hours of primary open access content).
This website will be updated every Monday (by 12:00 PM Eastern) or Tuesday (if Monday is a holiday). Given that the design, implementation, and management of pragmatic trials is a non-linear process, featured modules will relate to various sections of the curriculum wheel over time.
Partner Introduction (KRESCENT): 2-min video.
Summary: Dr. Mathieu Lemaire (Program Director) discusses KRESCENT, the Kidney Research Scientist Core Education and National Training program, which is a unique training initiative aimed at enhancing kidney research capacity in Canada. Launched in January 2005, the KRESCENT program’s primary goal is to train an increasing number of highly skilled scientists focused on the prevention of end-stage renal disease and the development of new treatments that improve the health of Canadians affected by kidney disease.
Statistics Section
P-values, confidence intervals, and multiple testing
To P or not to P in Randomized Controlled Trials – Understanding the Uncertainties (April 22, 2024 via “ACT AEC”): 50-min presentation & 10-min Q&A
Summary: Dr. Shrikant Bangdiwala sets the stage by reviewing the uncertainties present in health research and how randomized trials attempt to deal with such uncertainties. A historical overview of the statistical approach to inference is provided, along with a description of the origin of “significance testing” and the p-value, along with the definition of a p-value and its interpretation. The applicability of p-values in the context of randomized trials is discussed by answering the questions: (1) When is it entirely appropriate to use p-values? (2) When is it “OK with caveats” to use p-values? and (3) When is it entirely inappropriate to use p-values? Suggestions for alternative interpretations of trial results, that emphasize the clinical importance of a finding, are provided, as well as an introduction to confidence intervals. Two related papers published by the presenter can be found at these links: paper 1 (2013) (requires an institutional login) and paper 2 (2023).
Wasserstein RL, Lazar NA. The ASA Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016;70(2):129-133. (5-page paper)
Summary: Provides an informal definition of a p-value: “the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.” Overviews various principles: (1) P-values can indicate how incompatible the data are with a specified statistical model, (2) P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone, (3) Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold, (4) Proper inference requires full reporting and transparency, (5) A p-value, or statistical significance, does not measure the size of an effect or the importance of a result, and (6) By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. It is noted that because of the misconceptions related to p-values, some statisticians prefer to supplement or even replace p-values with other approaches (e.g., confidence, credibility, or prediction intervals).
Palesch YY. Some common misperceptions about P values. Stroke. 2014 Dec;45(12):e244-6. (3-page paper)
Summary: Defines what a p-value is, connects the concept to statistical significance, and then addresses two of the many issues regarding p-values in clinical trials. The first consideration relates to the conventional “need” to show p<0.05 (as a specific binary cut-off) to conclude statistical significance of a treatment effect, and the second regarding the misuse of p-values when testing group differences in baseline characteristics (between the intervention vs. control groups) in randomized trials.
Confidence intervals (Shedden K, Introduction to Data Science - University of Michigan, 2020-2021): 1-page website.
Summary: Estimates of an intervention effect are indeed just that, estimates, and estimates do not yield an “exact value” that represents the impact of an intervention. Thus, a researcher will often want to quantify the uncertainty in their estimate, which means that they will want to quantify how large the error in their effect estimate might be; one way to do this is using confidence intervals. While the specific (technical) details related to confidence intervals vary by the estimating procedure (and many other factors), the general concept of these intervals are briefly described here.
The Problem of Multiple Comparisons | NEJM Evidence (July 8, 2022 via “NEJM Group”): 3-min video
Summary: Briefly introduces the problem of multiple comparisons, or multiple testing, and the increasing likelihood of type I error as more comparisons are made within a single study. The following paper is referenced in the video: Austin PC, et al. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. J Clin Epidemiol. 2006 Sep;59(9):964-9. *The paper requires an institutional login.
Li G, et al. An introduction to multiplicity issues in clinical trials: the what, why, when and how. Int J Epidemiol. 2017 Apr 1;46(2):746-755. (10-page paper)
Summary: Multiplicity refers to the potential inflation of the type I error because of multiple testing; type I error refers to erroneously rejecting the null hypothesis (where the probability of a type I error is commonly referred to as the statistical significance level). For example, multiplicity issues can be introduced when trials involve multiple subgroup comparisons, comparisons across multiple treatment arms, analysis of multiple outcomes, and multiple analyses of the same outcome at different times. This article introduces multiple testing adjustments - to mitigate such issues - and clarifies the need to adjust for multiplicity.
Phillips MR, et al. The clinician's guide to p values, confidence intervals, and magnitude of effects. Eye (Lond). 2022 Feb;36(2):341-342. (2-page editorial)
Summary: Briefly summarizes three important topics that clinicians should consider when interpreting evidence: (1) P-values: what they tell us and what they don’t, (2) Overcoming the limitations of interpreting p-values: magnitude of effect, and (3) The role of confidence intervals. It is emphasized that p-values are one of several factors to consider when interpreting study results, with a greater appreciation of results being revealed when the magnitude of the estimated intervention effects - and associated confidence intervals - are taken into consideration as well.
Greenland S, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016 Apr;31(4):337-50. (14-page paper)
Summary: Misinterpretation of statistical tests, p-values, confidence intervals, and statistical power is common. With the goal of providing a resource for instructors, researchers, and consumers of statistics (whose knowledge of statistical theory may be limited), this article provides definitions and a discussion of basic statistics.
NIH Pragmatic Trials Collaboratory - Grand Rounds: A New Look at P Values for Randomized Clinical Trials (April 5, 2024): 30-min presentation & 30-min Q&A (26-slide presentation)
Summary: Dr. Erik van Zwet provides a brief (relatively technical) overview of statistical principles underpinning p-values and 95% confidence intervals. Statistical power is also discussed. The presentation largely follows through an article by the same name in NEJM Evid. (2024), which examined the primary efficacy results of more than 23,000 randomized clinical trials from the Cochrane Database of Systematic Reviews.
Dmitrienko A, D'Agostino RB Sr. Multiplicity Considerations in Clinical Trials. N Engl J Med. 2018 May 31;378(22):2115-2122. (8-page paper) * As this is an optional resource, an institutional login (e.g., university or research institute e-mail address) is required to access this material.
Summary: Multiplicity, or the use of many comparisons in a trial (multiple testing), increases the likelihood that a chance association could be deemed statistically significant. This problem arises in trials that have several objectives based on the evaluation of multiple endpoints or multiple dose-comparisons, evaluation of several patient populations, and/or other factors. This paper overviews statistical methods commonly used in trials to correct for multiplicity.