Sample Size Determination

We often encountered mixed or contradictory in terms of the findings / results / conclusion (be it the magnitudes of the effect of X's on Y or the difference in proportions or prevalence between two subgroups) when we did a literature review on our topic of interest. This apparent contradictions can also be partly attributable to either sampling variability and/or underpowered studies. Underpowered studies can often lead to large confidence intervals and p-values.

In social science and other types of research, the findings from the sample are often generalized to the population. The costs of committing a Type II error - the failure to report an effect when that effect actually exists or the failure to report the difference between groups when the differenct actually exists can be detrimental. As such, sample size determination is an essential part of a study protocol because we should do our best to determine the optimal sample size in order to minimize the likelihood of Type II error.

The ways to calculate the optimal sample size is dependent on the study objectives, the study design, the intended statistical techniques used, and so forth. In addition, power, sample size, and effect size are related; you can figure out the third if you know two of them. Increasing the sample size will reduce the sampling error (standard error of the mean), thus reducing the overlap between null and alternative hypotheses and increasing the power.

Sample size determination is simpler under simple random sampling (SRS); we only have to determine the sample size (i.e. number of sampling units) needed so that the smallest possible sampling variance / error can be obtained. This can be more complex under stratified random sampling and multistage cluster sampling. For stratified random sampling, we have to first determine the overall sample size and then allocate to the number of sampling units to each individual stratum.

For bivariate and multivariate type of analysis, the reasonable statistical power (however defined) to detect the significance of an effect (x) on the outcome (Y) depends on how the outcome is being measured. The power depends either on the size of the sample, the number of that particular event occurring, or the number individuals who experience that particular event (e.g. Survival analysis, cohort analysis, etc.).

The sample size formula for two sample t-test and the use of modified VIF to compare means and proportions (multiple logistic regression) can be used to determine the optimal sample size in cases where the outcome variable is ordinal or categorical, like in Logistic regression or other types of GLMs (see Hsieh et al. 1998).

In cases of where variables are not normally distributes (e.g. contingency tables, categorical variables, etc.), an approach that simultaneously tests multiple parameters using a Wald statistic is used to determine power and sample size (see Shieh 2005).

In cluster randomized trials, the number of clusters and the presence of pre-treatment covariate into the model can also be used to determine whether there is an increased in statistical efficiency for (see Raudenbush 1997; Moerbeek 2006). Campbell et al. (2004) has also devised a sample size calculating tool to detect the minimally meaningful treatment effect. It is also possible to do a power analysis comparing mixed ANOVA/ANCOVA to multilvel models to find out which has more statistical power (see Murray et al. 2006).

The number of participants per group, different degrees and patterns of attrition, group differences, nonlinearity, and the type of outcome variable (binary, Poisson, etc.) are also factored in when determining the optimal sample size and power for latent growth curve models and multilvel models with discrete and continous outcomes, as well as for other types of longitudinal studies (see Muthen & Curran 1997; Hedeker et al. 1999; Raudenbush & Xiao-Feng 2001; Jung & Ahn 2003; Winkens et al. 2006; Yan & Su 2006).

In cases of mixture models, sample size determination can serve different purposes, including distinguishing between different competing models and assigning each individual to the appropriate classes (Maxwell et al. 2008). Statistical power can be determined by examining how different combinations of parameters and sample size affect the effectiveness of parameter recovery from mixed distributions (Munoz & Acuna 1999; Zheng & Frey 2004). In mixture modeling, it is imperative to ensure that large sample sizes do not result in an overestimation of the number of classes (Maxwell et al. 2008).

Procedures to determine the optimal sample size for classification and regression trees and bootstrap approaches has not been well developed (Maxwell et al. 2008).

If the distribution form and population parameters of the data are known, Monte Carlo simulation can also be used as an approach to determine the optimal sample size.

Scott Maxwell and his colleagues also advised researchers to include confidence intervals in their attempt to estimate effect sizes because confidence intervals illustrate uncertainty in effects, thereby preventing the researcher from giving the readers a false sense of certainty and leading them to over or mis-interpret the presence of an effect (Maxwell et al. 2008). According to Scott Maxwell and his colleagues, researchers should take the width of the confidence interval into account in their attempt to determine the optimal sample size and pay close attention to extremely large confidence intervals. I agree with Scott Maxwell because the width of the confidence interval tells you the precision of an estimate (smaller widths are more precise). In addition, p-values tell you how likely a difference or an effect is due to chance, but a confidence interval tells you the possible range of the unknown population parameter.

Page updated

Google Sites

Report abuse