Sample Size REQUIREMENT

Data characteristics such as minimum sample size, nonnormal data, and scale of measurement (i.e., the use of different scale types) are among the most often stated reasons for applying PLS-SEM (Hair, Sarstedt, Ringle, et al., 2012; Henseler et al., 2009). While some of the arguments are consistent with the method’s capabilities, others are not.

For example, small sample size is probably the most often abused argument with some researchers using PLS-SEM with unacceptably low sample sizes (Goodhue, Lewis, & Thompson, 2012; Marcoulides & Saunders, 2006). These researchers oftentimes believe that there is some “magic” in the PLS-SEM approach that allows them to use a very small sample (e.g., less than 100) to obtain results representing the effects that exist in a population of several million elements or individuals. No multivariate analysis technique, including PLS-SEM, has this kind of “magic” capabilities. However, the result of these misrepresentations has led to skepticism in general about the use of PLS-SEM.

A good sample...

Should reflect the similarities and differences found in the population so that it is possible to make inferences from the (small) sample about the (large) population.
Shall safeguard that the results of analysis have adequate statistical power.
Shall ensure that the results of analysis are robust and the model is generalizable.

SOME METHODS TO DETERMINE MINIMUM SAMPLE SIZE

10 times rule (Barclay, Higgins, & Thompson, 1995)

This indicates the sample size should be equal to the larger of

10 times the largest number of formative indicators used to measure a single construct, or
10 times the largest number of structural paths directed at a particular construct in the structural model.

This rule of thumb is equivalent to saying that the minimum sample size should be 10 times the maximum number of arrowheads pointing at a latent variable anywhere in the PLS path model.

Conditions

Researchers have suggested that the “10 times” rule of thumb for determining sample size adequacy in PLS analyses only applies when certain conditions, such as strong effect sizes and high reliability of measurement items, are met.

Jackson (2003)

Sample size should be determined based on the number of parameter estimates (N:q rule). 5:1 is acceptable.

Kline (2011)

The more complex the model, the more the sample size is required.

A “typical” sample size is about 200 cases.

Hair et al. (2017) – PLS Primer

PLS-SEM is advantageous when used with small sample sizes (e.g., in terms of the robustness of estimations and statistical power; Reinartz et al., 2009). However, some researchers abuse this advantage by relying on extremely small samples relative to the underlying population.

All else being equal, the more heterogeneous the population in a structure is the more observations are needed to reach an acceptable sampling error level.

Researchers should consider the sample size against the background of the model and data characteristics (Hair et al., 2011; Marcoulides & Chin, 2013). Specifically, the required sample size should be determined by means of power analyses based on the part of the model with the largest number of predictors.

Hair et al., (2014) - Multivariate Data Analysis

Cohen (1992)

According to Hair et al. (2017), researchers can rely on rules of thumb such as those provided by Cohen (1992) in his statistical power analyses for multiple regression models, provided that the measurement models have an acceptable quality in terms of outer loadings (i.e., loadings should be above the common threshold of 0.70).

Sample Size Recommendation a in PLS-SEM for a Statistical Power of 80%

Green (1991)

Kock & Hadaya (2018)

Faul et al., (2007, 2009)

Alternatively to the above techniques, researchers can use programs such as G*Power (which is available free of charge at http://www.gpower.hhu.de/) to carry out power analyses specific to model setups (Hair et. al., 2017).

Use G*Power software to determine sample size requirement.

Please Click Here to download the software and to get further information about G*Power.

How many predictors do we have in the models?