Missing data occur when a respondent either purposely or inadvertently fails to answer one or more question(s).
The increased use of online data collection approaches has reduced missing data, because it is possible to prevent respondents from going to the next question if they do not answer a particular question. This forced-answer approach does motivate some individuals to stop answering the survey. But more often than not, it means respondents answer the question and move on because the reason for skipping questions was inadvertence.
When the amount of missing data on a questionnaire exceeds 15%, the observation is typically removed from the data file. Indeed, an observation may be removed from the data file even if the overall missing data on the questionnaire do not exceed 15%. For example, if a high proportion of responses are missing for a single construct, then the entire observation may have to be removed. A high proportion of missing data on a single construct is more likely to occur if the construct is measuring a sensitive topic, such as racism, sexual orientation, or even firm performance.
No replacement
Mid point of the scale
Random number
Mean value of the other respondents
Mean value of the other responses
Nearest neigbour
FIML (Full Information Maximum Likelihood)
EM (Expectation-Maximization)
MI (Multiple Imputation)
As with other statistical analyses, missing values should be dealt with when using PLS-SEM. For reasonable limits (i.e., less than 5% missing per indicator), missing value treatment options such as mean replacement, EM (expectation-maximization algorithm), and nearest neighbor (e.g., Hair et al., 2010) generally result in only slightly different PLS-SEM estimations.
Alternatively, researchers can opt for deleting all observations with missing values, which decreases variation in the data and may introduce biases when certain groups of observations have been deleted systematically.
Please download a data set here.