Missing Data

A major methodological research interest is the analysis of data with missing values. See for example Little & Rubin (2002) or review articles (Little 1992, 1997, 2005, 2008; Little and Schenker 1994; Andridge & Little 2010)

For data that are missing at random (MAR), I have developed likelihood-based (maximum likelihood or Bayes) for a variety of settings, including normal data (Beale & Little 1975), regression (Little 1992), long-tailed distributions (Little 1988, Lange, Little & Taylor 1989), discriminant analysis (Little 1978), mixed categorical and normal data (Little & Schluchter 1985), edit/imputation (Little & Smith 1987), and survival data with missing covariates (Chen & Little (1999, 2001).

More recently, Little & An (2004) propose penalized spline of propensity prediction, a robust MAR likelihood-based method for multivariate data with missing values based on regressions on splines of propensity scores. For extensions and additional work see An & Little (2008), Zhang & Little (2009, 2011), Little & Zhang (2009). Little et al. (2008) and Wang et al. (2011) develop hot deck multiple imputation methods for gaps in longitudinal data on recurrent events.

For data that are not MAR, Little (1985) points out the vulnerability of the Heckman method for correcting for selectivity bias. In Little (1993) I develop pattern-mixture models, a broad class of models that they do not require precise specification of the missing-data mechanism. Little and Wang (1996) extends the simple pattern-mixture model developed in Little (1994) to repeated-measures data with covariates. Tang, Little & Raghunathan (2003, 2004) develop a pseudo-likelihood method for fitting nonrandomly missing data that avoids specifying the precise form of the mechanism. Little and Raghunathan (1999) compare maximum likelihood and summary measures approaches to longitudinal data with drop-outs in a simulation study. Little (1995) develops a model-based framework for repeated-measures data with drop-outs, and places existing literature within this framework.

Missing not at random models based on alternative factorizations of the joint distribution of the data and missing-data indicators are considered in Yuan & Little (2009) and Zhou, Kalbfleisch & Little (2010).

In regression settings, missing not at random mechanisms can be accommodated without needing to specify the missing-data mechanism by selectively dropping cases, using subsample ignorable likelihood methods presented in Little & Zhang (2011). Another approach based on shrinkage priors is given in Zhang & Little (2011).

Concerning nonresponse in sample surveys, Little & Vartivarian (2003) show that the usual way of incorporating design weights into nonresponse weighting adjustments is flawed, and suggest better approaches. This work is extended to hot deck imputation in Andridge & Little (2009). Model-based methods for cluster samples with missing data are developed in Yuan & Little (2007ab, 2008). Sensitivity analysis for assessing the impact of survey nonresponse, encompassing missing not at random scenarios, are developed in Andridge & Little (2011) and Giusti & Little (2011).

Missing data in clinical trials is an important concern. I chaired a National Research Council committee that developed recommendations (National Research Council, 2010). Issues of causal inference arise when individuals do not take their assigned treatment. Little & Yau (1996) develops a multiple imputation method for intent-to-treat analysis of repeated measures data with drop-outs. Likelihood-based methods for estimating the complier-average causal effects of treatments are considered in Little & Yau (1998), Yau & Little (2001) and Peng, Little & Raghunathan (2004). Additional work on applying the principal stratification framework of causal inference to clinical trial problems is Little, R.J. et al. (2011), Long, Little & Lin (2008, 2010) and Little, Long & Lin (2009).

References on Missing Data

(a) Recent Articles

Partially missing at random

Conditions for Ignoring the Missing Data Mechanism in Likelihood Inferences for Parameter Subsets

Missing Data in Clinical Trials: NEJM (lit2012nejm.pdf, 410.0 kb, 11-23-2012)

The Prevention and Treatment of Missing Data in Clinical Trials

West and Little Applied Statistics (lit2013westappliedstat.pdf, 628.0 kb, 11-23-2012)

Non-response adjustment of survey estimates based on auxiliary variables subject to error

N. Zhang and Little Biometrics (lit2012zhangbiometrics.pdf, 694.0 kb, 11-23-2012)

A Pseudo-Bayesian Shrinkage Approach to Regression with Missing Covariates

G. Zhang and Little J Statist Comp Sim (lit2011zhangjscs.pdf, 1845.0 kb, 11-23-2012)

A comparative study of doubly robust estimators of the mean with missing data

Wang et al. 2011 Biometrics (lit2011wangbiometrics.pdf, 260.0 kb, 11-23-2012)

A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories

Andridge and Little JOS Proxy Pattern-Mixture Analysis (lit2011andridgejos.pdf, 231.0 kb, 11-23-2012)

Proxy Pattern-Mixture Analysis for Survey Nonresponse

Giusti and Little 2011 JOS (lit2011giustijos.pdf, 193.0 kb, 11-23-2012)

An Analysis of Nonignorable Nonresponse to Income in a Survey with a Rotating Panel Design

Yuan and Little Biometrics 2009 (lit2009yuanmetabiometrics.pdf, 279.0 kb, 11-23-2012)

Mixed-Effect Hybrid Models for Longitudinal Data with Nonignorable Dropout

(b) Books, Review Articles

Andridge, R.H. & Little, R. J. (2010). A Review of Hot Deck Imputation for Survey Nonresponse. International Statistical Review, 78, 1, 40-64.

Little, R.J.A. (1992). Regression with missing X's: a review. Journal of the American Statistical Association, 87, 1227-1237.

Little, R.J.A. (1997). Biostatistical Analysis with Missing Data. Article for Encyclopedia of Biostatistics, P. Armitage and T.Colton, eds., Wiley: London.

Little, R.J. (2005). Missing Data. In Encyclopedia of Statistics in Behavioral Science, Vol. 3, B. Everitt & D. Howell, eds., New York: Wiley.

Little, R.J. (2008). Selection and Pattern-Mixture Models. Chapter 18 in Advances in Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs, eds., pp. 409-431, London: CRC Press.

Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edition, New York: John Wiley.

Little, R.J.A., and Schenker, N. (1994) Missing data. In: Handbook for Statistical Modeling in the Social and Behavioral Sciences. G. Arminger, C.C. Clogg and M.E. Sobel, eds., pp. 39-75, Plenum, New York.

(c) Missing at Random methods

An, H. & Little, R.J. (2008). Robust Model-based Inference for Incomplete Data via Penalized Spline pf Propensity Prediction. Communications in Statistics - Simulation and Computation, 37, 9, 1718-1731.

Beale, E.M.L. & Little, R.J.A. (1975). Missing Values in Multivariate Analysis. Journal of the Royal Statistical Society, Series B, 37, 129 - 145.

Chen, H. Y. and Little, R.J.A. (1999). A Test of Missing Completely at Random for Generalized Estimating Equations with Missing Data. Biometrika, 86, 1, 1-13.

Chen, H. Y. and Little, R.J.A. (1999). Proportional Hazards Regression with Missing Covariates. Journal of the American Statistical Association, 94, 896-908.

Chen, H.-Y. and Little, R.J. (2001). A Conditional Profile Likelihood Approach for the Semiparametric Transformation Regression Model with Missing Covariates. Lifetime Data Analysis, 7, 207-224.

Lange, K., Little, R.J.A. & Taylor, J.M.G. (1989). Robust Statistical Inference Using the T Distribution. Journal of the American Statistical Association, 84, 881‑896.

Little, R.J.A. (1978). Consistent Regression Methods for Discriminant Analysis with Incomplete Data. Journal of the American Statistical Association, 73, 319‑322. {

Little, R.J.A. & Schluchter, M.D. (1985). Maximum Likelihood Estimation for Mixed Continuous and Categorical Data with Missing Values. Biometrika, 72, 497‑512.

Little, R.J.A. & Smith, P.J. (1987). Editing and Imputation for Quantitative Data. Journal of the American Statistical Association, 82, 58‑69.

Little, R.J.A. (1988). Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values. Applied Statistics, 37, 23‑38.

Little, R.J.A. (1991). Inference with Survey Weights. Journal of Official Statistics, 7, 405‑424.

Little, R.J.A. & An, H. (2004). Robust Likelihood-Based Analysis of Multivariate Data with Missing Values. Statistica Sinica, 14, 949-968.

Little, R.J.A. and Raghunathan, T. E. (1999). On Summary-Measures Analysis of the Linear Mixed-Effects Model for Repeated Measures When Data are not Missing Completely at Random. Statistics in Medicine, 18, 2465-2478.

Little, R.J., Yosef, M., Cain, K., Nan, B. & Harlow, S. D. (2008). A Hot Deck Multiple Imputation Procedure for Gaps in Longitudinal Data on Recurrent Events. Statistics in Medicine, 27, 103-120.

Little, R.J. & Zhang, G. (2009). Robust likelihood-based analysis of longitudinal data with missing values. Chapter 18 in Methodology of Longitudinal Surveys, ed. Peter Lynn, 317-330, New York: Wiley.

Wang, C., Little, R.J., Nan, B. & Harlow, S. (2011). A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories. Biometrics DOI: 10.1111/j.1541-0420.2011.01558.x.

Zhang, G. & Little, R. J. (2009). Extensions of the Penalized Spline of Propensity Prediction Method of Imputation. Biometrics, 65, 911-918.

Zhang, G. & Little, R. J. (2011). A Comparative Study of Doubly-Robust Estimators of the Mean with Missing Data. Journal of Statistical Computation and Simulation, 81, 12, 2039-2058, DOI: 10.1080/00949655.2010.516750.

(d) Missing Not at Random Methods

Little, R.J.A. (1985). A Note about Models for Selectivity Bias. Econometrica, 53, 1469‑1474.

Little, R.J.A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125-134.

Little, R.J.A. (1994). A class of pattern-mixture models for normal missing data. Biometrika 81, 3, 471-483.

Little, R.J.A. (1995). Modeling the Drop-Out Mechanism in Longitudinal Studies. Journal of the American Statistical Association, 90, 1112-1121

Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 60, 4, 591–605. doi: 10.1111/j.1467-9876.2011.00763.x

Little, R.J.A., and Wang, Y.-X. (1996) Pattern-mixture models for multivariate incomplete data with covariates. Biometrics , 52, 98-111.

Tang, G., Little, R.J. & Raghunathan, T. (2003). Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika, 90, 747-764.

Tang, G., Little, R.J. & Raghunathan, T. (2004). Analysis of Multivariate Monotone Missing Data by a Pseudo-Likelihood Method. In Proceedings of the 2nd. Seattle Symposium in Biostatistics: Analysis of Correlated Data. Lin, D.Y.; Heagerty, P.J. (Eds.). Lecture Notes in Statistics, 2004. New York: Springer Verlag.

Yuan, Y. & Little, R.J. (2009). Mixed-Effect Hybrid Models for Longitudinal Data with Nonignorable Dropout. Biometrics, 65, 2, 478-486.

Zhang, N. & Little, R.J. (2011). A Pseudo-Bayesian Shrinkage Approach to Regression with Missing Covariates. To appear in Biometrics.

Zhou, Y., Kalbfleisch, J.D. & Little, R.J. (2010). Block-Conditional MAR Models for Missing Data. Statistical Science, 25, 4, 517-532.

(e) Survey Nonresponse

Andridge, R.H. & Little, R.J. (2009). The Use of Sample Weights in Hot Deck Imputation. Journal of Official Statistics, 25, 1, 21-36.

Andridge, R.H. & Little, R.J. (2011). Proxy Pattern-Mixture Analysis for Survey Nonresponse. Journal of Official Statistics, 27, 2, 153-180.

Giusti, C. & Little, R.J. (2011). A Sensitivity Analysis of Nonignorable Nonresponse to Income in a Survey with a Rotating Panel Design. Journal of Official Statistics, 27, 2, 211-229.

Little, R.J.A. (1982). Models for Nonresponse in Sample Surveys. Journal of the American Statistical Association, 77, 237‑250.

Little, R.J. & Vartivarian, S. (2003). On Weighting the Rates in Nonresponse Weights. Statistics in Medicine, 22, 1589-1599.

Little, R.J.A. & Vartivarian, S. (2005). Does Weighting for Nonresponse Increase the Variance of Survey Means? Survey Methodology, 31, 161-168.

Yuan, Y. & Little, R.J. (2007). Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples with Unit Nonresponse. Journal of the Royal Statistical Society, Ser. C 56, 79-97.

Yuan, Y. & Little, R.J. (2007). Parametric and Semiparametric Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples with Item Nonresponse. Biometrics, 63, 1172-1180.

Yuan, Y. & Little, R.J. (2008). Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples with Item Nonresponse. Journal of Official Statistics 24, 193-211.

(f) Missing Data and Causal Inference in Clinical Trials

Little, R.J., Long, Q. & Lin, X. (2009). A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance. Biometrics, 65, 2, 640-649.

Little, R.J.A. and Yau, L. (1996). Intent-to-Treat Analysis in Longitudinal Studies with Drop-Outs. Biometrics , 52, 1324-1333

Little, R.J.A. and Yau, L. (1998). Statistical Techniques for Analyzing Data from Prevention Trials: Treatment of No-Shows Using Rubin's Causal Model. Psychological Methods, 3, 2, 147-159.

Little, R.J., Yosef, M., Nan, B., & Harlow, S. (2011). A Method for the Longitudinal Prospective Evaluation of Markers of a Subsequent Event (with Discussion and Rejoinder). American Journal of Epidemiology, 173, 12, 1380-1387. doi:10.1093/aje/kwr010

Long, Q., Little, R.J., & Lin, X. (2008). Causal Inference in Hybrid Intervention Studies Involving Treatment Choice. Journal of the American Statistical Association, 103, 474-484.

Long, Q., Little, Roderick J., & Lin, X. (2010). Estimating Causal Effects in Trials Involving Multitreatment Arms Subject to Non-Compliance: a Bayesian Framework. Journal of the Royal Statistical Society, Ser. C: Applied Statistics, 59(3), 513-531.

National Research Council (2010). The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials (R. Little, Chair). National Academy Press: Washington DC.

Peng, Y., Little, R.J. & Raghunathan, T. (2004). An Extended General Location Model for Causal Inferences from Data Subject to Non-compliance and Missing Values. Biometrics, 60, 598-608.

Yau, L. and Little, R.J.A. (2001). Inference for the Complier-Average Causal Effect from Longitudinal Data Subject to Noncompliance and Missing Data, with Application to a Job Training Assessment for the Unemployed. Journal of the American Statistical Association 96, 1232-1244.