Okay, this is the last real statistics thing you have to know for AP Biology, I promise! A Chi-Square test is basically a way for us to determine how accurate (or inaccurate) our predictions were. So a bag of M&Ms has 6 colors: blue, orange, green, yellow, red, and brown. If the factory claims that they make equal amounts of every color (they don't claim this, but let's pretend), you would expect each color to make up roughly 1/6, or 16.67% of the total number of M&Ms made. This is our assumption or estimate; this is what we would expect. We would call this our null hypothesis... it's kind of like our standard, boring option. This is the hypothesis that we will assume is correct unless we can disprove it.
A Chi-Square test can be used to test whether or not this is actually a decent claim (hypothesis). So we need a sample... Let's just buy a bag of M&Ms, and count up how many there are of each color. As long as it's a big enough bag (large enough sample), then we should have some confidence in our findings! These values that we count ourselves is what we observe.
Okay so let's pretend that I did buy a bag of M&M's, and I found the following numbers:
This kind of table is SO USEFUL for this test, and you'll see why. It looks scary, just take it piece-by-piece.
So I have my observed (o) values from counting. Now I need to figure out my expected (e) values... Well, what did I say we would expect to see based on our assumption that the colors are equally distributed? I would expect each color to represent 1/6 of the total. So I can just put 16.67% as my expected, right?? WRONG.
You cannot put a percentage in the 'expected' row. Remember, we're working with numbers of M&Ms in this table. So how many brown M&Ms would I expect to see IF they represent 1/6 of the total made? Well, it would depend on how big the bag is. So, I need to make sure I am looking at the same sample size as my actual data. So if I have a bag with 180 M&Ms in it, and it's a perfect world where every color is equally represented, I would expect 30 of each color. So now I can fill in my 'expected' row. It is all the same for every color because we are testing whether or not they are equally distributed. Sometimes, you'll work under different assumptions, and we'll test that in class. Now our table looks like:
Now all I have to do is go down the columns and fill in each row. So the first row says 'Difference (o-e)'. This means that I have to calculate o-e for that column. Then I can move on down the rest of the column. Eventually, the first column will look like:
Now I need to do this for all the columns. It will look like:
Now remember that ∑ from the equation? That means we need to find a sum. We need to add up that bottom row. That will finally give us our Chi-Square (χ2) value. So add it up and you get 17.98, and it enters the table:
This is, indeed, our Chi-Square value. But, we have one final step, and it is the most often-forgotten step. You've done all of this math, you're tired. Don't make it all for nothing by getting the answer wrong now.
Our last step is to take that Chi-Square value and compare it to what is known as the critical value. Basically, you are given a table, and that table tells you, based on your data, what value of Chi-Square will be significant. The table that you are provided, the critical value table, looks like:
Here's the easy part. First, we need to look at the p-value chart. Almost all scientists rely on a p-value of 0.05. So you only need to worry about that top row. You can totally ignore the bottom row (UNLESS A QUESTION SPECIFICALLY TELLS YOU TO USE THE VALUE OF 0.01, but that is rare).
Now we need to know our degrees of freedom. Don't worry too much about what that means or why it's called that for AP Biology. It is interesting and it can be useful, but I think it might just be too much to add on here. Basically, in a Chi-Square test like this, the degrees of freedom can be easily calculated by finding the number of categories we were looking at. So, we were measuring how many M&Ms there were of each color. So how many categories could an M&M be placed in? Well, there were 6 colors. So there are 6 categories. BUT WAIT. That doesn't mean our degrees of freedom value is 6. We actually have to take one away from that. It is 5, because 6-1=5.
Why did we take one away? Well, basically if an M&M wasn't red, blue, yellow, green, or brown, there is no other option than orange. So there are really, in a way, only 5 other options choices. Don't worry about this unless you have AP Statistics!
That's all well and good, I now know I have a critical value of 11.07 from the chart (remember, I'm in the top row because p=0.05, and I'm in the 5th column because df=5). Now, all I need to do is compare my Chi-Square (χ2) to that critical value. My Chi-Square value was 17.98. The critical value is 11.07. My Chi-Square (χ2) is larger than the critical value.
As a result, we state that 'we reject the null hypothesis'. Basically, our Chi-Square value shows that the number of M&Ms from each color were waaaaaay off. They are not equally distributed. Each color does not represent 1/6 of the total. So we reject that hypothesis. Therefore, we accept the alternative hypothesis of 'The M&Ms are not evenly distributed'.
In AP Biology if the Chi-Square( χ2) value is larger than critical value we say we reject the null hypothesis. If the χ2 value is less than the critical value we say we fail to reject the null hypothesis
REVIEW VIDEO: