Sums of Squares
The total length of the videos in this section is approximately 63 minutes. Feel free to do this in multiple sittings. You will also spend time answering short questions while completing this section.
You can also view all the videos in this section at the YouTube playlist linked here.
The Spock Example
![](https://www.google.com/images/icons/product/drive-32.png)
In case you are curious:
Obituary includes story of trial
You can find boxplots displaying the Spock data in this folder (restricted to Wellesley affiliates, for copyright reasons).
Question 1: What is unusual about Spock's judge's venires?
Show answer
The sets of potential jurors "randomly chosen" for Spock's judge's trials include many fewer women than the set of potential jurors assigned to other judges.
You might also have noticed that all of the venires are mostly men. This data would look different if collected today. But in the 1960's, women hadn't been voting and/or serving on juries all that long. It seems that there were different rules for men's and women's participation in juries as late as 1979 or even 1994.
Spock's example as motivation for comparing multiple groups
![](https://www.google.com/images/icons/product/drive-32.png)
Question 2: What is the null hypothesis of the test we are working toward?
Show answer
The population means are equal across all of the groups. This is a generalization of the null hypothesis for a two-sample t-test, that the two population means are equal.
What information should be included in our test statistic for comparing multiple groups?
![](https://www.google.com/images/icons/product/drive-32.png)
Question 3: Which data set provides stronger evidence that the four samples are drawn from populations with different means, A or B?
Question 4: Which data set provides stronger evidence that the four samples are drawn from populations with different means, B or C?
Show answer
A provides stronger evidence than B. The sample means are the same, but when the data points are tightly gathered around the sample means, it is less likely that the differences in sample means occurred by chance.
C provides stronger evidence than B. When the within-group variances are the same, then sample means that are farther apart are more suggestive that the population means are difference.
What about A v. C? It's hard to tell. What we need is a test statistic that will allow us to measure whether the sample means were likely to follow their particular pattern if the populations all had the same mean, given the within-group variances observed in that example.
Motivating test statistic with graphics
![](https://www.google.com/images/icons/product/drive-32.png)
Question 5: What characteristics of a data set should make us more likely to reject the null hypothesis that all the groups come from populations with the same mean?
Show answer
Sample means that are farther apart; small within-group variances.
Equal means model
![](https://www.google.com/images/icons/product/drive-32.png)
Question 6: What do we subtract from each data point in order to form the residuals when calculating SST?
Show answer
The overall mean, ignoring which data point comes from which group.
Separate means model
![](https://www.google.com/images/icons/product/drive-32.png)
Question 7: SSW is the numerator of which quantity that we've previously discussed?
Show answer
pooled sample variance
More on SSW
![](https://www.google.com/images/icons/product/drive-32.png)
Question 8: Earlier in this lecture, we identified two features of the data that we hoped would be reflected by the test statistic. Which of those two features is quantified by SSW?
Show answer
SSW reflects the within-group variance.
SST greater than or equal to SSW
![](https://www.google.com/images/icons/product/drive-32.png)
Question 9: How does this value, 96, relate to the other numbers we have calculated so far?
Show answer
SST - SSW = 160-64 = 96
Discrepancy between SST and SSW
![](https://www.google.com/images/icons/product/drive-32.png)
Question 10: SSB + SSW =
Show answer
SST
Extra sum of squares F-test
At the very end of this video, when I am showing how to run the test in R, I should have written "group1" inside the aov command rather than "group".
![](https://www.google.com/images/icons/product/drive-32.png)
Question 11: What is the F-statistic for the little example we used to illustrate SST, SSW, and SSB?
Show answer
F = (SSB/(I-1)) / (SSW/(n-I)) = (96/(2-1)) / (64/(6-2)) = 96 / 16 = 6
Whether this is a large or small value of the F-statistic (in other words, whether this will lead to a large or small p-value) depends on (I-1) and (n-I), as those "degrees of freedom" determine which version of the F-distribution is the reference distribution.
The next lecture will demonstrate how to organize these sums of squares into an ANOVA table and find a p-value.