Both the MCAR (missing completely at random) and MAR (missing at random) assume that the reasons for the missing data are “ignorable” (Little, 1995) – in short, that the analyses can proceed without jointly model the missing data and outcomes distributions explicitly. However, data can also be data missing not at random (aka MNAR, NMAR, or non-ignorable non-response). For instance, individuals with multiple DUI arrest may be less inclined to answer questions about their prior DUI arrest. Selection bias arises when the data is MNAR because it only consists of individuals with certain characteristics/attributes but in actuality, there is a systematic difference between individuals with and without that characteristic/attribute. Selection bias can also arise when individuals are not randomly assigned to one group versus the other. In that case, estimates would likely to be biased if the statistical techniques does not adequately account for non-ignorable non-response. This is because the outcomes of interest (Yi) and the missing data indicator which indexes patterns of missing data variable (Ri) are not jointly modeled. In other words, when data is MNAR, the mechanism of the missing data should be modeled as part of the statistical inference.
If individuals are randomly selected into the sample and we are sure that there are no unobserved variables that predict the likelihood of being included in the sample, we can use a single equation model. Using a list of predictors to predict the outcome in a single equation can only worsen the selection bias problem because it may change the unexplained variance for the propensity to be included in the sample when regressed on a set of predictors without changing the covariance between error terms for both the selection and outcome equations. Both the Heckman Selection Model and the Pattern Mixture Model can be used to address data that are potentially missing not at random.
The Heckman Selection Model corrects for selection bias by using a two-stage procedure to simultaneously estimate using a maximum likelihood approach. In the first stage (i.e. the selection model), a dummy variable indicating whether a characteristic is present (=1) is regressed on a vector predictor variables using a probit regression model. This vector represents the characteristics pertaining to an individual, a region (e.g. county, state, country, etc.), and so forth. Then, the predicted values from the probit regression model are obtained and used to form the inverse Mills ratio. In the second stage (i.e. the outcome model), the inverse Mills ratio are used as a predictor variable in the OLS regression model.
The Pattern Mixture Model can be used to determine if non-ignorable non-response is present in longitudinal data (Little, 1993 and 1995). As such, when non-ignorable non-response in longitudinal studies is a concern (Fitzmaurice et al., 2001; Hedeker and Gibbons, 1997; Pauler et al., 2003), this approach can help clarify the gaps between missing-at-random based analyses and the range of plausible biases induced by possibly non-ignorable missing data (Paddock et al., 2006).
Both techniques derive their inference based on the joint density [f(Y, R)] for the outcomes of interest (Yi) and the missing data indicator which indexes patterns of missing data variable (Ri). However, they employ the conditional probability rule in opposite directions. As such, the joint densities are factored differently across these two models.
The joint density is factored as f(Y, R) = Prob(R|Y)f(Y) in the Heckman Selection models, where f(Y) is the distribution of the full dataset and Prob(R|Y) is the selection function (response mechanism). As such, under this approach, the mechanism of the missing data has to be explicitly modeled because both the observed and unobserved values of a variable determine whether the subject is missing.
The joint density is factored as f(Y, R) = f(Y|R)Prob(R) in the Pattern Mixture models, where f(Y|R) are the distributions of the observed and missing data and Prob(R) is the marginal response probability. Under this approach, the data are stratified by the missing-data patterns based on the variables under consideration and f(Yi|Ri) does not equal f(Yi). This is because the Pattern Mixture model accounts for non-ignorable non-response by "re-expressing" the joint density of Yiand Ri by modeling Yi conditional on Ri and by allowing f(Y|R) to differ by the patterns of missing data.
Both Y and R are random variables.
The Heckman Selection Model is conceptually appealing but the standard errors produced under this approach are often incorrect and has to be adjusted (e.g. using Heckman and Heckprob in STATA). In addition, the method can be inconsistent if there’s considerable overlap between the predictor variables in the selection model and the inverse Mills ration.
There’s no way we can know for sure whether our data/sample is missing not at random (MNAR). We need to have a very good understanding of the missing data process before deciding on whether to use one of the models (selection / pattern mixture) for data not missing at random. Likewise, both multiple imputation and maximum likelihood approaches can produce optimal inferences even if the data is MNAR but we can never be 100% sure whether we have gotten the right model. Thus, it is imperative to conduct sensitivity analysis in order to assess how sensitive our results are to the model choices (MAR vs. MNAR).
Once we have a good understanding of the missing data process, the choice between PMM and SM depends on:
1. The study objective(s)
2. How do we want to incorporate the assumption pertaining to the missing-data mechanism? However, these assumptions are untested and cannot be verified from our data.
3. How flexible is the sensitivity analysis?