General Notes
aka: Contingency Tables, Chi-Square Test for Independence, Crosstab Tables
Hypotheses: H0 = statistical independence of variables; Ha = statistical dependence of variables
Test statistic: Chi Squared
P-value: P = right-hand tail probability above observed chi square value with df = (r-1)(c-1)
Conclusion: Report P-value; reject H0 at alpha level if P <= alpha
The chi squared test is a test for association between two categorical variables. The hypothesis of the chi squared test depends on the form of randomization among the row and column elements, although the test statistic and its method of calculation remain the same.
There are three questions that may be addressed when analyzing a contingency table.
How likely is it that at least the observed degree of association would occur in a sample if the variables are truly independent in the population?
This is where most analysts stop. But, all we know so far is that there either is, or is not, an association.
The chi-squared test only tells you that a difference or association exists somewhere in
the table.
It does not tell us what part of the table might be most heavily influencing the difference, nor does it tell us how large the difference might be.
How do the data depart from independence?
Indicated by the cell chi square residuals.
How strong is the association?
Indicated by the odds ratios.
The test statistic takes the form sum[ (O - E)2/E ]
In addition to the chi squared test, other tests are available (as noted above) for the same data which supplements the information of the chi squared test statistic. These include residual chi square by cell, and relative risk and odds ratios.
See the file "Stat Notes - Chi Square.pdf" (attached below) for more information.
See also "Stat Notes - Nonparametric Statistics.pdf" (attached to the page Nonparametric Methods).
While the test statistic is the same, the type of hypothesis depends on the fixed or random nature of the row and column variables (see Conover). (to-do: complete table below, add reference)
Conover page 203-204
Assumptions
The observations are independent and identically (Binomially) distributed, and are the result of Bernoulli trials.
Other assumptions are noted by Arsham.
If one or more Expected cell frequencies is less than five (5), then the use of Fisher's Exact test (based on the Hypergeometric distribution) may be warranted.
For more information of Fisher's Exact Test, see the document attached below entitled "ASQ When To Use Fishers Exact Test ssfmv2i4bower.pdf".
Also available from ASQ here.
See also NIST Fisher's Exact Test.
to-do
Add notes about the continuity correction.
References
Resources