Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

  1. Chan, K. W. & Meng, X.-L. (2022). Multiple Improvements of Multiple Imputation Likelihood Ratio Tests. Statistica Sinica, 32, 14891514.

Abstract

Multiple imputation (MI) inference handles missing data by first properly imputing the missing values m times, and then combining the m analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard F distribution; (ii) it is not invariant to re-parametrization; (iii) it fails to ensure monotonic power due to its use of an inconsistent estimator of the fraction of missing information (FMI) under the alternative hypothesis; and (iv) it requires non-trivial access to the likelihood ratio test statistic as a function of estimated parameters instead of datasets. This paper shows, via both theoretical derivations and empirical investigations, that essentially all of these problems can be straightforwardly addressed if we are willing to perform an additional likelihood ratio test by stacking the m completed datasets as one big completed dataset. A particularly intriguing finding is that the FMI itself can be estimated consistently by a likelihood ratio statistic for testing whether the m completed datasets produced by MI can be regarded effectively as samples coming from a common model. Practical guidelines are provided based on an extensive comparison of existing MI tests.

Figure: Idea of the proposed test.

https://drive.google.com/open?id=1cnsIJDHYkF5PEWFGQFoPhafyfuT0XZ5L

Figure: Power curves of different multiple imputation (MI) tests. Null hypothesis: the means of a bivariate Normal random vector are equal. Level of significance is 0.5%. The tests were performed under non-standard parametrizations; see Figure 5 of the main paper for a detailed discription.

      • W-1,2,3,4: different MI Wald tests; see Rubin (2004, Wiley);

      • L-1: the best existing MI likelihood ratio test (LRT) by Meng and Rubin (1992, Biometrika);

      • L-2: a trivial modification of L-1 (to ensure positivity of the test statistics and the estimator of fraction of missing information (FMI));

      • L-3: first proposed MI LRT, which guarantees invariance and positivity of test statistics;

      • L-4: a trivial modification of L-3 (to ensure positivity of the estimator of FMI);

      • L-5: second proposed MI LRT, which guaranteed (i) positivity of both test statistic and estimate of FMI, (ii) invariance to parametrization, and (iii) consistency of estimator of FMI; and

      • L-0: benchmark, i.e., the LRT without using MI.