On a complete sample size and power justification

Click here to view a checklist that can be printed and used in developing a research proposal.

In my view there are five essential issues that a power/sample size analysis must address in the context of a research proposal. Issues 1, 2 and 3 should be addressed for each analytic aim or sub-aim. If a study ends up recruiting only one sample, then issue 4 is only relevant to the entire sample to be recruited.

1. How many/how much?

Either:

a) What is the minimum number of people to detect a the desired effect? or

b) what is the minimum effect that can be detected given the number of people that can be studied?

This is what a statistician or methodologist can help with. This may involve calculations and/or simulations. In general, the more simply this can be explained and approached the better, but not at the neglect of important aspects of design (e.g., clustering of observations, longitudinal design, multiple comparisons, and plausible attrition or other causes of missing data).

2. Effect size feasibility

Effect size feasibility means: the effect to be detected is within the realm of possibility, and ideally justified by rigorous prior research. For example, if the study is a randomized controlled trial, prior evidence must provide a basis to conjecture that the desired effect size is possible to be observed. The evidence for this could come from prior human or animal studies, including that published or pilot for the proposed study, or inferred from prior observational studies. Proposals that document that the prior research supporting the effect size justification were of high quality and were themselves adequately powered satisfy, in part, the "premise" criterion of the NIH rigor reproducibility mandate.

Another aspect of effect size feasibility to consider is the quality of the outcome measurement device. The outcome measurement device should have sufficient measurement resolution to capture the differences or changes that are hypothesized. It is important for the investigator to convince the reviewers that the outcome measurement device is useful and appropriate for the design.

3. Clinical relevance

Clinical relevance means that the planned or desired effect is of sufficient magnitude so as to satisfy some judgement regarding practical or clinical significance. When all else fails, we can fall back on Norman et al (2003; Interpretation of changes in health related quality of life: the remarkable universality of half a standard deviation. Medical Care, 41(5), 582-592) but something more specific to the field of study proposed would make a more rigorous case. The evidence for this would also come from the literature or prior or pilot research.

4. Sample size feasibility

Is the desired sample size feasible to collect? Are the resources and capabilities available to the team consistent with the sample size target? Plans should account for attrition, refusal, missing data, etc. Evidence to support sample size feasibility include: information on the size of the sampling frame from which the research participants will be recruited (e.g., number of potentially eligible patients visiting the hospital during the proposed field period), the size of the catchment area in which the study is going to be collected, and a recounting of prior successes in achieving planned sample sizes in funded research. This information might be spread throughout the proposal, not only in the research plan but in the environment section and human subjects section. If this information is spread out, it is a good idea to refer the reader to those sections and repeat assertions regarding feasibility in the sample size justification section of the research plan.

5. Fail safe

Finally, a section on anticipated problems and strategies to overcome them should include anticipated challenges in meeting enrollment goals and details on how the study can be salvaged if challenges arise.

Example: sample size known or constrained

  • Most commonly, when an investigator approaches a methodologist for help with sample size and power, the investigator already has a good idea as to what the sample size will be in their study. Sample sizes are principally limited by the availability of resources, such as the project budget, the capacity of field staff, or limits of the sampling frame (e.g., number of patients from which to identify potentially eligible participants).
  • The next logical question to answer is: what is the minimal detectable effect size given this sample size. To determine this, the methodologist must know many things including the design of the study, the nature of the outcome, and the anticipated pattern of missing data (e.g., attrition).
  • With the sample size known, the minimally detectable effect size is solved for. Ideally this is solved for using simplifying assumptions for easy explanation and confirmed using simulation methods.
  • Simulation studies are probably the only way to get an accurate sense of the study power when even modest complexities in design are introduced. However, simulation studies are difficult to explain concisely, and in a way that is interpretable to a wide readership. An opaque and cursorily explained sample size justification based on a simulation study can sometimes be about as useful as not including any sample size justification at all. Investigators and their statisticians/methodologists are absolutely encouraged to use simulations to estimate power, but also encouraged to find simplifications of closed form sample size calculations that can be easily explained and double checked during review.
  • With a known sample size and solved minimally detectable effect, the justifications begin. The investigator must justify (and document the rigor of) the prior evidence supporting the feasibility of observing the minimum detectable effect (or a larger effect). The investigator must provide a convincing case that the sample size is feasible. Investigators should solicit help from a statistician or methodologist in interpreting prior pilot and/or published work to determine rigor and effect sizes. The effects to detected must be justified as clinically or practically relevant.
  • Assuming the minimum detectable effect size is feasible and clinically relevant, the sample size and power considerations section of the proposal can be drafted. Each aim/hypothesis must be explicitly addressed.

February 12, 2018

Rich Jones (rich_jones@brown.edu)

Quantitative Science Program

Department of Psychiatry and Human Behavior, Department of Neurology

Warren Alpert Medical School, Brown University