Home‎ > ‎Statistics‎ > ‎Notes‎ > ‎

Assumptions/Restrictions for Chi-square Tests on Contingency Tables

Tables Larger than 2x2

For tables larger than 2x2, the chi-square distribution with the appropriate degrees of freedom provides a good approximation to the sampling distribution of Pearson's chi-square when the null hypothesis is true, and the following conditions are met:
  • Each observation is independent of all the others (i.e., one observation per subject);
  • "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734).
Notice that it is okay to have some expected counts less than 5, provided none are less than 1, and at least 80% of the expected counts are equal to or greater than 5.

2x2 Tables: The Standard Advice

The standard advice for 2x2 tables dates back to Cochran (1952, 1954) or earlier, and goes something like this:
  • Each observation is independent of all the others (i.e., one observation per subject)*
  • All expected counts should be 10 or greater. 
  • If any expected counts are less than 10, but greater than or equal to 5, some authors suggest that Yates' Correction for continuity should be applied.  This is done by subtracting 0.5 from the absolute value of O-E before squaring.  However, the use of Yates' correction is controversial, and is not recommended by all authors.
  • If any expected counts are less than 5, then some other test should be used (e.g., Fisher exact test for 2x2 contingency tables)**.
What you may not know, however, is that the minimum expected count of 5 appears to have been an arbitrary choice (probably by Fisher), and that Cochran (1952) suggested it may need to be modified when new evidence became available.  A paper by Ian Campbell (2007) has provided some of that evidence. 

2x2 Tables: Advice from Campbell (2007)

Campbell (2007) distinguishes between 3 different research designs that give rise to 2x2 tables:
  1. Comparative trials.  Here, samples are drawn from two populations, so the row (or column) totals are fixed, and the question is whether the proportions falling into the two categories of the other variable are the same or not.
  2. Cross-sectional design (aka. naturalistic design, or double-dichotomy design).  Here, one sample of N is drawn, and subjects are classified on two dichotomous variables.
  3. 2x2 independence trial.  Here, both sets of marginal totals are fixed by the investigator.  This is rarely the case in practice.
Regarding the 2x2 independence trial, Campbell (2007, p. 3662) says, "Here there is no dispute that the Fisher-Irwin test [i.e., the Fisher exact test] (or Yates's approximation to it) should be used."  But as noted, this situation rarely arises in practice, which is why Campbell focuses on the first two designs almost exclusively.

I'll not go into great detail here.  In a nutshell, Campbell showed via simulations that for comparative trials and cross-sectional designs, the Fisher-Irwin test and Pearson's chi-square with Yates' correction are both far too conservative (i.e., the actual Type I error rate is well below the nominal alpha).  Here is the advice he gives in closing (p. 3674):
  1. Where all expected numbers are at least 1, analyse by the ‘N -1’ chi-squared test (the K. Pearson chi-squared test but with N replaced by N -1).
  2. Otherwise, analyse by the Fisher–Irwin test, with two-sided tests carried out by Irwin’s rule (taking tables from either tail as likely, or less, as that observed).

As Campbell then notes, "
This policy extends the use of the chi-squared test to smaller samples (where the current practice is to use the Fisher–Irwin test), with a resultant increase in the power to detect real differences."

The 'N -1' chi-square

Where Campbell describes replacing N with N -1, he is referring to this formula for Pearson's chi-square:

            chi-square = N(ad-bc)^2 / (mnrs)

  • N is the total number of observations
  • a, b, c, and d are the observed counts in the 4 cells
  • ^2 means "squared"
  • m, n, r, s are the 4 marginal totals
If one has the regular Pearson chi-square (e.g., in the output from statistical software), it can be converted to the 'N - 1' chi-square as follows:

               'N -1' chi-square = Pearson chi-square x (N -1) / N

Equivalence of the 'N-1' Chi-square and the Linear-by-Linear Association Chi-square

This webpage used to include links to two SPSS syntax files that could be used to compute the 'N-1' chi-square and its p-value in SPSS.  I have now removed those links, because it turns out that for 2x2 tables, the Linear-by-Linear Assocation chi-square computed by the SPSS CROSSTABS is equivalent to the 'N-1' chi-square.  See this document for details, and this Statistics in Medicine article by Frank Busing, Sacha Dubois and me for examples of how to compute the N-1 Chi-square using SAS, Stata or R. 

Pearson's Chi-square vs. the Likelihood Ratio Chi-square

The following is from Alan Agresti's book, Categorical Data Analysis.

It is not simple to describe the sample size needed for the chi-squared distribution to approximate well the exact distributions of X^2 and G^2 [also called L^2 by some authors].  For a fixed number of cells, X^2 usually converges more quickly than G^2.  The chi-squared approximation is usually poor for G^2 when n/IJ < 5 [where n = the grand total and IJ = rc = the number of cells in the table].  When I or J [i.e., r or c] is large, it can be decent for X^2 for n/IJ as small as 1, if the table does not contain both very small and moderately large expected frequencies. (Agresti, 1990, p. 49)


*  For matched pairs of subjects, or 2 observations per person, McNemar's chi-square (aka. McNemar's Change Test) may be appropriate.  In essence, it is a chi-square goodness of fit test on the two discordant cells, with a null hypothesis stating that 50% of the changes (or disagreements) go in each direction.

**  Fisher's exact test is a "conditional" test--it is conditional on the observed marginal totals (i.e., row and column totals).  Unconditional exact tests are also available.  For example:



Agresti, A. (1990). Categorical Data Analysis. New York: Wiley.

Campbell, I. Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statist. Med. 2007; 26:3661-3675).  See also www.iancampbell.co.uk/twobytwo/background.htm.

Cochran WG. The [chi-squared] test of goodness of fit. Annals of Mathematical Statistics 1952; 25:315–345.

Cochran WG. Some methods for strengthening the common
[chi-squared] tests. Biometrics 1954; 10:417–451.

Yates, D., Moore, Moore, D., McCabe, G. (1999). The Practice of Statistics
   (1st Ed.). New York: W.H. Freeman.

Bruce Weaver