statrefs home‎ > ‎Main‎ > ‎Methods‎ > ‎Nonparametric Methods‎ > ‎

Chi-Squared Test


General Notes
 

aka:  Contingency Tables, Chi-Square Test for Independence, Crosstab Tables


Hypotheses:    H0 = statistical independence of variables; Ha = statistical dependence of variables
Test statistic:   Chi Squared
P-value:           P = right-hand tail probability above observed chi square value with df = (r-1)(c-1)
Conclusion:      Report P-value; reject H0 at alpha level if P <= alpha
 
The chi squared test is a test for association between two categorical variables.  The hypothesis of the chi squared test depends on the form of randomization among the row and column elements, although the test statistic and its method of calculation remain the same.
 
There are three questions that may be addressed when analyzing a contingency table.
  1. How likely is it that at least the observed degree of association would occur in a sample if the variables are truly independent in the population?
    • This is where most analysts stop.  But, all we know so far is that there either is, or is not, an association. 
      • The chi-squared test only tells you that a difference or association exists somewhere in
        the table. 
      • It does not tell us what part of the table might be most heavily influencing the difference, nor does it tell us how large the difference might be.

  2. How do the data depart from independence?
    • Indicated by the cell chi square residuals.

  3. How strong is the association?

The test statistic takes the form sum[ (O - E)2/E ]


In addition to the chi squared test, other tests are available (as noted above) for the same data which supplements the information of the chi squared test statistic.  These include residual chi square by cell, and relative risk and odds ratios.
 
 
See the file "Stat Notes - Chi Square.pdf" (attached below) for more information.
See also "Stat Notes - Nonparametric Statistics.pdf" (attached to the page Nonparametric Methods).

 
While the test statistic is the same, the type of hypothesis depends on the fixed or random nature of the row and column variables (see Conover).  (to-do:  complete table below, add reference)
 
Conover page 203-204

 

 Fixed

Random

Hypothesis

(remember - probability is largely just counting things)

 Row Totals

X

 

Row totals are fixed

 Column Totals

 

X

When listing all possible contingency tables that have the fixed row totals, the column totals are allowed to vary freely

 Grand Total

X

 


Grand total is fixed

       

 Row Totals

 

X

When listing all possible contingency tables that have the fixed column totals, the row totals are allowed to vary freely

 Column Totals

X

 

Row totals are fixed

 Grand Total

X

 

Grand total is fixed

       

 Row Totals

 

X

Row totals are allowed to vary freely

 Column Totals

 

X

Column totals are allowed to vary freely

 Grand Total

X

 

Grand total is fixed

With row totals and column totals both allowed to vary freely, this results in a greater number of possible contingency tables (when calculating probabilities)

 
 

Assumptions
 
The observations are independent and identically (Binomially) distributed, and are the result of Bernoulli trials.

Other assumptions are noted by Arsham.


If one or more Expected cell frequencies is less than five (5), then the use of Fisher's Exact test (based on the Hypergeometric distribution) may be warranted.
  • For more information of Fisher's Exact Test, see the document attached below entitled "ASQ When To Use Fishers Exact Test ssfmv2i4bower.pdf". 
  • Also available from ASQ here
  • See also NIST Fisher's Exact Test.

to-do
Add notes about the continuity correction.



References