Pragmatic Trials Training Program

Subgroup Analyses

(Week of October 14, 2025)

Primary Content

Module 4-5 – Subgroup Analyses in Pragmatic Trials (17-Minute Video)

This module breaks down what subgroups are, why they matter for real-world decision making, and how changing the subgroup changes the trial's estimand. You’ll compare interaction models vs. separate subgroup models, see why interaction tests are often underpowered, and learn how multiplicity fuels false positives. Then we describe a checklist to assess the credibility of subgroups' results.

** The video's content and narration were generated with the assistance of artificial intelligence, with human guidance and oversight throughout the process. **

Curriculum Wheel

Additional Material

Sun et al., 2010.pdf

Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses (Source)

This article explains how to judge whether differences seen in subgroups of a clinical trial are real or statistical noise. The authors move away from a yes/no verdict and instead place subgroup credibility on a continuum. They update earlier guidance with a practical checklist organized into design, analysis, and context. Key ideas include defining subgroups using baseline characteristics, checking whether the effect is independent of other factors, interpreting the interaction p-value on a continuum (rather than a hard cutoff), looking for consistency across related outcomes and other studies, and whether a biological rationale supports results.

Burke et al., 2015.pdf

Three simple rules to ensure reasonably credible subgroup analyses (Source)

The limitations of subgroup analyses are well established—false positives due to multiple comparisons, false negatives due to inadequate power, and limited ability to inform individual treatment decisions because patients have multiple characteristics that vary simultaneously. In this article, the authors apply Bayes’ rule to determine the probability that a positive subgroup analysis is a true positive. From this framework, the authors derive simple rules to determine when subgroup analyses can be performed as hypothesis testing analyses and thus inform when subgroup analyses should influence how we practice medicine.

Walsh et al. 2014.pdf

The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index (Source)

Objectives: A P-value <0.05 is one metric used to evaluate the results of a randomized controlled trial (RCT). We wondered how often statistically significant results in RCTs may be lost with small changes in the numbers of outcomes.

Study design and setting: A review of RCTs in high-impact medical journals that reported a statistically significant result for at least one dichotomous or time-to-event outcome in the abstract. In the group with the smallest number of events, we changed the status of patients without an event to an event until the P-value exceeded 0.05. We labeled this number the Fragility Index; smaller numbers indicated a more fragile result.

Results: The 399 eligible trials had a median sample size of 682 patients (range: 15-112,604) and a median of 112 events (range: 8-5,142); 53% reported a P-value <0.01. The median Fragility Index was 8 (range: 0-109); 25% had a Fragility Index of 3 or less. In 53% of trials, the Fragility Index was less than the number of patients lost to follow-up.

Conclusion: The statistically significant results of many RCTs hinge on small numbers of events. The Fragility Index complements the P-value and helps identify less robust results.

Google Sites

Report abuse