Categorical variables (or nominal variables) are variables whose values are qualitative labels / names / categories. For example, the outcome of flipping a coin is either head or tail; the response to the question "What is your gender?" can be among {female, male, non-binary, prefer not to say}; a choice of background color of presentation slides can be white, black, yellow, blue, green, etc.
Categorical variables are said to have binary outcomes if there are only TWO possible outcomes, e.g., outcome of a coin flip: {head, tail}, answer to a multiple-choice exam question: {correct, incorrect}, result of a penalty kick in football: {score, miss}. For a fixed number of observations of binary outcomes (e.g., 10 coin flips), we can conduct statistical tests on the underlying proportion or probability of one outcome (e.g., to test whether p(head) = 0.5). Such tests are called tests of proportions.
For categorical variables with 3 or more outcomes, we can conduct statistical tests on whether the distribution of counts for different outcomes "fit" a certain underlying outcome distribution. For this purpose, we can use the goodness-of-fit chi-square test.
Chi-square test can also be used for testing whether the association between two categorical variables exists or not for example, if there's any association between gender (male or female) and favourite colour (e.g., red/yellow/green, etc.). If there's no association between two categorical variables, the allocation of each combination should be even (or weighted even). If there's an association, the actual, observed allocation of each combination will deviate from the even one. The chi-square test of such difference is the test of independence.
Multiple observations of binary outcomes (e.g., a series of coin flips: Head, Head, Tail, Head, Tail, Tail, Head, Tail; a set of answers to exam questions: Correct, Incorrect, Incorrect, Correct, Correct, Correct, Incorrect, Correct) can be assumed to be independently generated from the same underlying distribution (Bernoulli distribution).
We can treat these observations of binary outcomes as sample data an conduct statistical tests on the proportion one outcome in the underlying distribution.
Such tests are called the tests of proportions.
One common test of proportion is the binomial test.
Refer to Slides 22-29 here
The chi-square statistic (χ2) is a measure of the difference between the observed and expected frequencies of the outcomes of a set of events or variables.
The data used in calculating the chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables and a large enough sample.
There are two types of chi-square tests - the goodness of fit test and the test of independence.
The goodness of fit test is applied when you have one categorical variable with two or more values from a single population.
The null hypothesis (H0 ) assumes that there is no significant difference between the observed and the expected value. The alternative hypothesis (H1) assumes that there is a significant difference between the observed and the expected value.
Some people said that students' preferences for living hostel shared the same proportion (i.e. 1:1:1:1 ratio). We use the goodness of fit test to examine this hypothesis.
Q: Do students' preferences for living hostel have the consistent proportions as expected?
A: We use the “χ2 Goodness of fit" under “Frequencies” in jamovi.
Conclusion/ Interpretation (APA format):
Students' preferences for living hostel was not equally distributed in the population, X2 (3, N = 1000) = 266, p < .001.
We can use jamovi just as a handy calculator for conducting the goodness-of-fit test by entering count values directly into the cells.
For illustrated steps, refer to Slides 50-55 here
The test of independence compares two sets of data in a contingency table to see if there is an association. It can only assess the associations, but cannot provide any inferences about causation.
The data must meet the following requirements:
Large random sample size
The expected frequency of each category must be at least 5
Two categorical variables
Two or more groups for each variable
Independence of observations
There is no relationship between the subjects in each group.
The categorical variables are not "paired" in any way (e.g. pre-test/post-test observations)
The null hypothesis (H0 ) assumes that two variables are not associated with each other. The alternative hypothesis (H1) assumes that two variables are associated with each other.
Expected counts = Column totals X Row totals / Grand total for each of the cells
Some students believe that the faculty you belong to can determine your relationship status. To find out whether it is true or not, we can use the test of independence.
Q: Are faculty and relationship status associated with each other?
A: We use the “χ2 test of association” under “Frequencies” in jamovi to examine.
Conclusion/ Interpretation (APA format):
There is no association between faculty and relationship status, X2 (2, N = 1000) = 1.01, p = .603.
Same as the goodness-of-fit test, we can also enter count values directly into the cells in jamovi to conduct the chi-square test of independence.
For this test of independence, we need to also enter the two variables (rows and columns) as two separate variables.
For details, see Slides 67-71 here.
Now, if you think you're ready for the exercise, you can check your email for the link.
Remember to submit your answers before the deadline in order to earn the credits!