Multiple-Imputation for Missing Data

Multiple imputation (MI) (Rubin, 1987) is one of the most commonly used techniques for handling missing data. It is popular because it separates the jobs of handling missing data by imputers and performing inference by analysts; see Tu et al. (1993). Despite its many successes, multiple failures have been found over the years.

Multiple Improvements of Multiple Imputation Tests

      • Our first contribution is to point out a striking fact: the existing MI likelihood ratio test is not guaranteed to have power; see Chan and Meng (2022). Interestingly, it hinges again on the variance estimation: the estimator of Var(θobs), the variance of the observed data estimator, is inconsistent under the alternative hypothesis. Estimating Var(θobs) is difficult because the fraction of missing information (FMI), i.e., the relative increase in variance due to missingness, is usually unknown. Moreover, I found that the existing MI likelihood ratio test (Meng and Rubin, 1992) is (i) not invariant to parametrization, (ii) not always non-negative, and (iii) requires analysts to have non-trivial complete-data procedures.

      • We re-developed the MI testing procedure so that all the aforementioned problems are eliminated; see Chan (2021+) Theoretically, a particularly intriguing finding is that the FMI can be estimated consistently by a likelihood ratio statistic for testing whether the multiply imputed datasets can be regarded as samples coming from a common model. Moreover, it is robust against the falsity of the null. Computationally, we showed that performing MI test is straightforward if analysts are willing to perform an additional test by stacking all imputed datasets as one big completed dataset. The stacking principle is novel and is different from all existing MI methods. (Back to recent projects)