A new educational offering from the Data section of the curriculum wheel has been posted (1-1.5 hours of primary open access content).
This website will be updated every Monday (by 12:00 PM Eastern) or Tuesday (if Monday is a holiday). Given that the design, implementation, and management of pragmatic trials is a non-linear process, featured modules will relate to various sections of the curriculum wheel over time.
Data Section
Missing data in clinical trials: an introduction
The Case of the Missing Data | NEJM Evidence (March 25, 2022 via “NEJM Group”): 4-min video.
Summary: Using a hypothetical trial concept, this animated video briefly describes the complex reality of missing data. Emphasizes that missing data can still cause bias when the amount of missing data is (relatively) small and the sample size is large; it is not necessarily appropriate to omit participants (with missing data) in this setting, as it depends on the reason(s) for the missingness. Mechanisms of missing data are introduced: (1) Missing completely at random (MCAR), (2) Missing at random (MAR), and (3) Missing not at random (MNAR), with simplified scenarios describing each mechanism. Approaches to handling missing data are alluded to, emphasizing that the best method is dependent on many factors, including the underlying missing data mechanisms.
Missing Data and Multiple Imputation. Columbia University Mailman School of Public Health (2022): 6-section website.
Summary: Provides a succinct overview of missing data and defines (1) Missing completely at random (MCAR), (2) Missing at random (MAR), and (3) Missing not at random (MNAR). Data are described as MCAR if the probability of a variable being missing for a given participant is independent from both observed and unobserved variables for that participant. If data are MCAR, then the subsample consisting of participants with complete (or non-missing) data is a representative subsample of the overall sample. Data are MAR if, after accounting for all the observed variables, the probability of a variable being missing is independent from the unobserved data. Lastly, data are MNAR if they are neither MAR nor MCAR; data are MNAR if the probability of a variable being missing, even after accounting for all the observed variables, is dependent on the value of the missing variable. Options for analysis are also overviewed: complete case analysis, treating missing values as a separate category (for categorical variables) or replacing missing continuous variables with the mean, censoring in analyses of longitudinal data, single imputation, and multiple imputation. Tips for implementing multiple imputation are provided, as well as a list of relevant textbooks, methodological/application articles, and software options. * As has been recommended in various modules, the involvement of a (bio)statistician (from the design stage onward, of a trial) is key in mitigating and addressing missing data. i.e., While multiple imputation is a powerful statistical tool, it often requires making untestable (non-verifiable) assumptions, which should be thoughtfully considered with a multidisciplinary study team.
Statistical Review - Missing Data with Dr. David Harrington (April 3, 2024 via “NEJM Group”): 17-min video.
Summary: Dr. David Harrington discusses missing data in the context of randomized trials. In conversation with another researcher, the following questions are addressed: what are missing data, why missing data occurs, and how is missing data addressed in a trial? There are several types of missing data: e.g., missing outcome data, missing covariate data, missing baseline data, and such data can be missing for various reasons: e.g., participants who discontinue or withdraw from the intervention (e.g., due to adverse events) and missed study visits (if applicable). It is emphasized that missing data is best mitigated during the design and execution of the trial, so that the trialist needs to rely on statistical methods (in the analysis stage) as little as possible. However, when statistical solutions are deployed, the mechanisms of missingness are important to understand and consider: (1) Missing completely at random (MCAR), (2) Missing at random (MAR), and (3) Missing not at random (MNAR). Approaches to handling missing data are introduced: e.g., complete case analysis, single imputation, and multiple imputation (the preferred method). It is recommended that when missing data is present, it should be addressed using several robust approaches, to assess how sensitive the findings are to the choice of statistical approach.
Prevention of Missing Data in Pragmatic Clinical Trials of Nonpharmacologic Interventions for Pain Management. Pain Management Collaboratory Biostatistics/Design Work Group (V. 1.0, March 10, 2020): 5-page working document.
Summary: This working paper discusses missing data in the context of (pragmatic) clinical trials. It is reiterated and emphasized that statistical methods often cannot fully compensate for missing data, and that the design and execution of the trial should limit the likelihood of missingness. Several methods for preventing missing data (during the design and execution of trials) are listed: (a) Distinguish discontinuation of the intervention from study withdrawal, (b) Reduction of participant burden, (c) Design/select outcomes with less missingness, (d) Flexible data collection, (e) Acceptance of concomitant medications/interventions or rescue interventions, (f) Integrated prompts for research data collection in clinical assessments, (g) Enhancement of study engagement, (h) Monitor missing data. It is recommended that all trials should have a missing data plan as part of their protocol, and that, in addition to discussing analytic strategies for addressing this issue, the plan should identify the specific strategies the trial will employ to prevent missingness.
Missing data in trials: what you can do (infographic) - MRC Clinical Trials Unit at UCL (V. 10): 1-page infographic.
Summary: An infographic describing missing data in clinical trials, from the perspectives of participants and carers, patients and other partners, research staff (e.g., research nurses), statisticians, trialists, and funders. The following questions are answered for each group: (1) Why does missing data matter? and (2) How can I help?
Missing data in clinical research, Dr. Peter Austin - April 18, 2024 (April 23, 2024 via “Sunnybrook Hospital”): 1-hr 12-min video (1-hr presentation & 12-min Q&A)
Summary: Dr. Peter Austin introduces statistical methods for the analysis of missing data and describes the performance of multiple imputation in the presence of a high prevalence of missing data. A brief overview of some of his other research on methods for missing data is also provided.
Austin PC, et al. Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Can J Cardiol. 2021 Sep;37(9):1322-1331. (10-page paper)
Summary: Referenced in the aforementioned presentation, this paper provides an introduction to multiple imputation - a preferred method for handling missing data in epidemiologic research - and discusses challenges in its implementation (e.g., development of the imputation model, how many imputed data sets to create, and addressing derived variables). The application of multiple imputation is illustrated through an analysis of patients hospitalized with heart failure. Code for conducting multiple imputation in statistical software (i.e., R, SAS, and Stata) is provided.
Little RJ, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012 Oct 4;367(14):1355-60. (6-page paper)
Summary: In reference to a previous report on the topic of missing data in clinical trials, this paper summarizes the main findings and recommendations of that report. i.e., Table 1 describes eight ideas for limiting missing data in the design of clinical trials, and Table 2 describes eight ideas for limiting missing data in the conduct of trials.
Sterne JA, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009 Jun 29;338:b2393. (5-page paper)
Summary: This article reviews the reasons for why missing data may lead to bias in clinical research. A discussion of circumstances in which multiple imputation may help in reducing bias and/or in increasing precision is provided, as well as a description of potential pitfalls in the application of this approach. A description of the use (and reporting) of analyses using multiple imputation in general medical journals is listed, and guidelines for the conduct and reporting of such analyses are provided.
Jakobsen JC, et al. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017 Dec 6;17(1):162. (10-page paper)
Summary: Missing data can bias results from clinical trials, especially if such data are not understood and are handled inappropriately. The bias due to missing data depends on the mechanism causing the data to be missing, and whether the appropriate analytical methods are applied to address the issue. Therefore, the analysis of trial data with missing values - which is often difficult to avoid - requires planning and attention. This paper presents a practical guide and flow charts describing when and how multiple imputation (a preferred approach to handling missing data) should be used to address missing data in a trial.
Haneuse S, et al. Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task. JAMA Netw Open. 2021 Feb 1;4(2):e210184. (4-page paper)
Summary: The use of routinely collected data (in pragmatic trials, or elsewhere) introduces additional challenges, both conceptually and technically, when exploring and addressing the issues of missing data. (i.e., When one is relying on already-collected data, and is using it for a secondary purpose, the “idea” of missing data is not as clear as say, a participant skipping a scheduled study visit or not completing a section of a survey/questionnaire). i.e., The key points relate to the idea that developing one’s understanding of missing data is particularly challenging in studies that use routinely collected health data (e.g., administrative health data, disease registries, electronic health records) because data from these systems are generally not collected with a particular research agenda in mind.
McGrath L, Wong J, Chapter 16 - Special topics in electronic health data: missing data and unstructured data, Editor(s): Girman CJ, Ritchey ME, Pragmatic Randomized Clinical Trials, Academic Press, 2021: 18-page book chapter (pages 219-236) * As this is an optional resource, an institutional login (e.g., university or research institute e-mail address) is required to access this material.
Summary: While pragmatic trials can involve the use of electronic health data (e.g., administrative claims data), there are aspects of such data that must be considered while designing and conducting a trial, including issues related to missing data. Missing data is a common problem in electronic health data and can occur because of how (and why) the data are collected within these systems. This book chapter discusses missing follow-up data on outcomes, loss to follow-up, censoring, and competing risks.
NIH Pragmatic Trials Collaboratory - Grand Rounds (Biostatistics Series): Methods for Handling Missing Data in Cluster Randomized Trials (Rui Wang, PhD; Moderator: Fan Li, PhD) (January 5, 2024): 55-min webinar (33-slide presentation)
Summary: Dr. Rui Wang provides a technical overview of statistical methods for handling missing data in cluster randomized trials, which are experiments in which intact units (clusters), rather than independent individuals, are randomly allocated to intervention groups.