The Χ2-test, pronounced "ki-square test" was the invention of the prominent statistician Karl Pearson in 1900. REcall that we have dealt so far with tests involve drawing from a 1-0 count box. In such a case we have seen that a z-test or t-test might be appropriate. We shall now turn to the Χ2-test, which is used when we wish to make inferences when more than two categories are considered. More specifically, in this chapter we will examine two uses of the Χ2-test, namely, 1) the goodness-of-fit test, and 2) test for independence.
The so called "goodness-of-fit" derives from the question of whether data fit some predetermined model or hypothesis. Someone might be interested to know whether a die is fair. Each throw can be classified into one of six categories, {1,2,3,4,5,6}. Assuming we tossed a die 60 times. Let further assume the following outcome summarized in the table below:
That is in 60 rolls of the dice, we observed 4 "1"s, 6 "2"s and so on. For a fair die, we can expect 10 observations for each outcome. Clearly, the observed frequency and expected frequency differ. There seems to be too few "1"s and too many "3"s and "4"s. In contrast, the difference in observed "5"s and "6"s compared to the expected vvalues could be due to chance. The thing is that a researcher could conduct six separate t-tests, but still would be undecided about the fairness of thedie, as he or she would get different conclusions for different tests. The Χ2-test helps overcome this by producing a single test statistic that treats all cases simultaneously. It is called a Χ2-test because the test statistic adopts the Χ2-distribution.
The Χ2-distribution, like the student's t-distribution is characterized by a single parameter, the degree of freedom. The Χ2-distribution is not symmetric but is skewed to the right; the more the skewness for smaller degrees of freedom.
To find the area under the curve, we can refer to the Χ2-table. The first column represents the degrees of freedom, while the top row corresponds to the area under the curve to the right of the Χ2-values shown below. For example, at d.f = 4 and .05 significance level, the Χ2-critical value would be 9.488.
The rest of the test follows the same procedure as before. The null and alternative hypothesis can be stated as:
H0: The Die is Fair
Ha: The Die is Not Fair
Secondly, the test statistic is calculated by:
Note that a large Χ2-statistic means that the observed and expected frequencies are far apart from each other suggesting that this is a bad fit. More precisely, at some level of confidence α, one can then use the Χ2-table to find the corresponding p-value or simply compared the test statistic to the critical value also read off the table. If the p-value is less than α or the test statistic is greater than the critical value, then the null hypothesis is rejected.
For our example on the toss of a die 60 times, we could pick α = 0.05. We calculate the Χ2-test statistic to be 14.2. Note that the degrees of freedom is not the sample size less one (as was the case with the t-test), but rather the number of cases less one, i.e. 6−1 = 5. Reading from the table, we find that the corresponding area to the right of 14.2 is between 2.5% (that is the area to the right of 12.833) and 1% (i.e. the area to the right of 15.086). That is, the p-value is between 2.5 and 1%, which is less than our depicted α. Similalry, we arrive at the same conclusion by comparing the test statistic with the critical value at α = 0.05 and degrees of freedom 5. that is, the test statistic is larger than the critical value, (i.e. 14.2 > 12.83) leading us to reject the null hypothesis of unfair die.1
Another use of the Χ2-test is to test for independence of two variables.2 Let's take an example. From a survey of 2,237 people, their number of handedness is summarized in the following table.
The question we would like to pose is whether gender and handedness independent. There can be of course various reasons why a researcher might be interested in such a question. For example, in neurophysiology, it could be hypothesized that women use relatively more their left-side of their brain (i.e. their rational faculty) than men do. This could explain why women are more rational then men. Sociologist on the other hand argue that women are suually subject to greater pressure to follow the social norms than men. If society prefer people to use their right-hand, then perhaps naturally left-hand women might be under greater pressure to change to using their right hand.
The hypothesis may be set up as follows:
H0: Handedness and gender are independent
Ha: Handedness and gender are NOT independent
That is, the alternative is that handedness is distributed in a similar way among the population irrespective of gender, and that any oberved difference in the sample is due merely to chance.
We use the same Χ2-test statistic as in (16.1) above. We have observed values, so what are the expected values? The following table shows the expected values, which I will explain later.
Hence we get:
The degree of freedom for this problem is (3−1)×(2−1) = 2, which correspond the the fact that there are three outcomes of handedness and two possible outcomes of gender. The following table shows the difference between the observed and expected values, which is nothing more than the deviations.
The bottom row and left-most column show that the vertical and horizontal sums respectively, or what is the sum of deviation, add to zero. This means that we need to know only two deviations, and the others can be automatically found, hence the degrees of freedom is 2. In sum, when testing independence in a m × n table with no other constraints on their probabilities, there are (m − 1) × (n − 1) degrees of freedom.
The p-value is the area to the right of Χ2 = 12 at 2 degrees of freedom, which from the table is less than 1%. So we can safely reject the null. Alternatively, the critical value for this problem at α = 5% is 5.99. Because the |TS| > |critical value|, we reject the null ias before in favor of the alternative of no independence.
Lastly, let's see how the expected frequencies was found.
First, irrespective of gender, we calculate the ratios of right-handed, left-handed and ambidextrous people in the sample. For example, the ratio of right handed persons is (934 + 1070)/2237 = 89.6%, and so on. If handedness and gender are independent, then we can assume that the same ratios apply to men and women separately, that is, the number of right-handed men in the sample would be 89.6% of 1067 = 956. We can do the same for left-handed and ambidextrous men, and then also for women until the table is complete as above.
1. As a rule of thumb, the Χ2-test should be used when the expected frequency of each line in the table is 5 or more.
2. Recall that when studying probability, we tested for the independence of two variables using conditional probability (i.e. A and B are independent if P(A|B) = P(A) or P(B|A) = P(B)).