Pragmatic Trials Training Program

Missing Data

(Week of October 20, 2025)

Primary Content

Module 4-6 – Missing Data Is a DANGER to Every Clinical Trial (22-Minute Video)

In this module, we review challenges of missing data in pragmatic trials. We explore why missing data is common in pragmatic trials, how it threatens validity and power, and what trialists can do to prevent or address missing data.

** The video's content and narration were generated with the assistance of artificial intelligence, with human guidance and oversight throughout the process. **

Curriculum Wheel

Additional Material

Little et al. 2013.pdf

The Prevention and Treatment of Missing Data in Clinical Trials (Source)

This article highlights how missing data can seriously weaken the reliability of results in clinical trials. It summarizes recommendations from a U.S. National Research Council panel on how to prevent and address missing data. The authors emphasize that the best solution is prevention—through careful trial design, participant follow-up after treatment discontinuation, and minimizing participant burden.

Heymans et al. 2022.pdf

Handling missing data in clinical research (Source)

Abstract: Because missing data are present in almost every study, it is important to handle missing data properly. First of all, the missing data mechanism should be considered. Missing data can be either completely at random (MCAR), at random (MAR), or not at random (MNAR). When missing data are MCAR, a complete case analysis can be valid. Also when missing data are MAR, in some situations a complete case analysis leads to valid results. However, in most situations, missing data imputation should be used. Regarding imputation methods, it is highly advised to use multiple imputations because multiple imputations lead to valid estimates including the uncertainty about the imputed values. When missing data are MNAR, also multiple imputations do not lead to valid results. A complication hereby is that it not possible to distinguish whether missing data are MAR or MNAR. Finally, it should be realized that preventing to have missing data is always better than the treatment of missing data.

Austin et al. 2021.pdf

Missing Data in Clinical Research: A Tutorial on Multiple Imputation (Source)

Abstract: Missing data is a common occurrence in clinical research. Missing data occurs when the value of the variables of interest are not measured or recorded for all subjects in the sample. Common approaches to addressing the presence of missing data include complete-case analyses, where subjects with missing data are excluded, and mean-value imputation, where missing values are replaced with the mean value of that variable in those subjects for whom it is not missing. However, in many settings, these approaches can lead to biased estimates of statistics (eg, of regression coefficients) and/or confidence intervals that are artificially narrow. Multiple imputation (MI) is a popular approach for addressing the presence of missing data. With MI, multiple plausible values of a given variable are imputed or filled in for each subject who has missing data for that variable. This results in the creation of multiple completed data sets. Identical statistical analyses are conducted in each of these complete data sets and the results are pooled across complete data sets. We provide an introduction to MI and discuss issues in its implementation, including developing the imputation model, how many imputed data sets to create, and addressing derived variables. We illustrate the application of MI through an analysis of data on patients hospitalised with heart failure. We focus on developing a model to estimate the probability of 1-year mortality in the presence of missing data. Statistical software code for conducting MI in R, SAS, and Stata are provided.

Jakobsen et al. 2017.pdf

When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts (Source)

Background: Missing data may seriously compromise inferences from randomised clinical trials, especially if missing data are not handled appropriately. The potential bias due to missing data depends on the mechanism causing the data to be missing, and the analytical methods applied to amend the missingness. Therefore, the analysis of trial data with missing values requires careful planning and attention.

Methods: The authors had several meetings and discussions considering optimal ways of handling missing data to minimise the bias potential. We also searched PubMed (key words: missing data; randomi*; statistical analysis) and reference lists of known studies for papers (theoretical papers; empirical studies; simulation studies; etc.) on how to deal with missing data when analysing randomised clinical trials.

Results: Handling missing data is an important, yet difficult and complex task when analysing results of randomised clinical trials. We consider how to optimise the handling of missing data during the planning stage of a randomised clinical trial and recommend analytical approaches which may prevent bias caused by unavoidable missing data. We consider the strengths and limitations of using of best-worst and worst-best sensitivity analyses, multiple imputation, and full information maximum likelihood. We also present practical flowcharts on how to deal with missing data and an overview of the steps that always need to be considered during the analysis stage of a trial.

Conclusions: We present a practical guide and flowcharts describing when and how multiple imputation should be used to handle missing data in randomised clinical.

Google Sites

Report abuse