Inferential techniques based on Frequentist (vs. Bayesian) methods have the following five basic elements.
See texts by Agresti, Mendenhall, Conover, and Montgomery, as well as NIST Hypothesis Tests.
Assumptions
Hypotheses
Test Statistic
P-value
Conclusion
Other components include:
Null Distribution (aka Reference Distribution - Montgomery)
Rejection Region (aka Critical Region) - based on the null distribution
Critical Value of the Test Statistic (see also this StatSoft blog entry - How to Find Critical Values for Statistical Tests)
The purpose of the Test Matrix is to serve as a guide to the discussion of each of these elements when designing an experiment or a study.
References:
"Statistical Methods for the Social Sciences, Third Edition" by Alan Agresti and Barbara Finlay, chapter 6.
"Practical Nonparametric Statistics, Third Edition" by W. J. Conover, section 2.3.
See also this StatSoft blog entry - How to Interpret Statistical Analysis Results.
See the map of Hypothesis Tests
The type(s) of data, form of the population, method of sampling (randomness of the sample).
See also notes on Hypothesis Testing.
The hypothesis that is directly tested. Usually a statement that there is no effect, difference, or change. A significance test analyzes the strength of the evidence against the null hypothesis.
A hypothesis that contradicts the null hypothesis. May also be known as the research hypothesis.
Hypotheses are formulated before collecting or analyzing the data. See notes on a-priori hypotheses.
simple vs. compound hypothesis;
two-tailed test vs. one-tailed (lower or upper)
defines the location of the rejection region of the null distribution
one-sided vs. two-sided tests
(not always the same as one- and two-tailed ... see Conover, pages 98 and 431)
hypotheses for two-sided test
H0: mu = mu0 Ha: mu <> mu0
examples of null and alternate hypotheses for one-sided tests
H0: mu <= mu0 Ha: mu > mu0
H0: mu >= mu0 Ha: mu < mu0
The statistic calculated from the sample data to test the null hypothesis based on the null distribution. Typically expressed as a point estimate related to a population parameter. Should also be accompanied by an interval estimate of the parameter, reflecting the standard error (uncertainty) of the test statistic value.
A good test statistic is one that is a sensitive indicator of whether the data agree or disagree with the null hypothesis.
The distribution of the test statistic under the assumption that the null hypothesis is true.
Also known as the "Critical Region". Used, along with alpha level, as the basis for a Decision Rule.
The set of all points, or collection of test statistic values, in the sample space that result in the decision to reject the null hypothesis at a specific alpha level.
The Rejection Region is defined by the Critical Value of the Test Statistic and the Null Distribution.
The smallest significance level at which the null hypothesis would be rejected for the given observation. The p-value is calculated under the assumption that the null hypothesis is true (based on the null distribution of the test statistic).
Determined by the null distribution, the rejection (critical) region of the null distribution, and the calculated value of the test statistic.
Understand the difference between statistical significance and practical significance.
See also "confidence interval".
NIST - Practical Versus Statistical Significance
Arsham - The Meaning and Interpretation of p-values
StatSoft - What is Statistical Significance (p-value)?
Reported p-value, along with a formal decision. Also recommend reporting point and interval estimates of the test statistic, along with the p-value.
In some situations, the value of the test statistic may be reported instead of the p-value. One example would be reporting the value of Cpk (or Ppk). Another example would be reporting GR&R results. In these (and other similar) cases, both point and interval estimates of the test statistic are available and should be considered.