SREL Reprint #2269

 

Assessing effect and no effect with equivalence tests

Philip M. Dixon

Introduction: Risk assessment can involve quantitative comparisons between groups, e.g., between an affected area and an appropriate background, or between an exposed group and an appropriate control. No Observed Effect Concentrations (NOEC), used in the quotient method of risk assessment, are derived from experimental comparisons of controls and groups exposed to a toxicant (Newman, 1995). Evaluation of effluents in the U.S. National Pollutant Discharge Elimination System (NPDES) Permits Program also includes an acute toxicity test which compares mortality in control waters to mortality in waters containing a specified concentration of effluent (Weber, 1993). The comparison might also be done as one of the final stages in a probabilistic risk assessment, in which case the quantity being compared is the estimated risk.
If formal statistical methods are used in these comparisons, usually they are tests of the null hypothesis of no difference. The details of the test (e.g., nonparamettic test, t-test, or something more complicated) will depend on the form of the data and the set of assumptions that are made, but in all conunon cases the statistical test evaluates a null hypothesis (H0) of no difference. For example, an analysis of an acute toxicity experiment might test the null hypothesis that there is no difference in mean toxicity between control and effluent-treated water. Because ecological and environmental data include many sources of random variation that are not eliminated by good experimental design, exposure to effluent may appear to increase mortality even if there is no true effect. The statistical null hypothesis of no true difference is rejected only if there is strong evidence (summarized by the α-level or p-value) of an effect. These tests are familiar to anyone who has had an introductory course in statistics and are reasonable in many clinical trials and agricultural experiments, where there is concern that random variation might be mistaken for true improvements (Fisher, 1960).
The classical null hypothesis (no difference between two treatments) is inappropriate if the intent is to prove that some treatment is "safe" or has no effect (Bross, 1985; Millard, 1987; Dixon and Garrett, 1994). If a treatment has no effect, then one expects a statistical test to accept the null hypothesis, but accepting the null hypothesis (of no difference) does not prove that the treatment has no effect. The causal "arrow" cannot be reversed. The null hypothesis may be accepted because the true difference is close to zero, because the number of replicates is too small, or because the random variation among experimental units is too large. Tests of the classical null hypothesis are biased in favor of concluding "no effect," even if there were an effect (Hoekstra and van Ewijk, 1993). Large-scale ecological and environmental experiments typically have few replicates because of the cost and practical difficulties of replicating at appropriate spatial scales (Weber, 1993). Finally, ecological quantities often include considerable measurement error, spatial variability, and temporal variability (Eberhardt, 1978; Osenberg et al., 1994). The result of large random variation and low replication is a low statistical power (low probability of rejecting the null hypothesis) even if the treatment has a moderate effect.

SREL Reprint #2269

Dixon, P.M. 1998. Assessing effect and no effect with equivalence tests. pp. 275-301 In: Newman, M.C. and C.L. Strojan (Eds.). Risk Assessment: Logic and Measurement. Ann Arbor Press, Chelsea, MI.

 

This information was provided by the University of Georgia's Savannah River Ecology Laboratory (srel.uga.edu).