Sums of Squares

The total length of the videos in this section is approximately 63 minutes. Feel free to do this in multiple sittings. You will also spend time answering short questions while completing this section.

You can also view all the videos in this section at the YouTube playlist linked here.

The Spock Example

Sums of Squares.1.Part 1.mp4

In case you are curious:

Dr. Spock

Obituary includes story of trial

Wikipedia page for Judge Ford

You can find boxplots displaying the Spock data in this folder (restricted to Wellesley affiliates, for copyright reasons).

Question 1: What is unusual about Spock's judge's venires?

Show answer

The sets of potential jurors "randomly chosen" for Spock's judge's trials include many fewer women than the set of potential jurors assigned to other judges.

You might also have noticed that all of the venires are mostly men. This data would look different if collected today. But in the 1960's, women hadn't been voting and/or serving on juries all that long. It seems that there were different rules for men's and women's participation in juries as late as 1979 or even 1994.

Spock's example as motivation for comparing multiple groups

Sums of Squares.2. Part 2.mp4

Question 2: What is the null hypothesis of the test we are working toward?

Show answer

The population means are equal across all of the groups. This is a generalization of the null hypothesis for a two-sample t-test, that the two population means are equal.

What information should be included in our test statistic for comparing multiple groups?

Sums of Squares.3.Part 3.mp4

Question 3: Which data set provides stronger evidence that the four samples are drawn from populations with different means, A or B?

Question 4: Which data set provides stronger evidence that the four samples are drawn from populations with different means, B or C?

Show answer

A provides stronger evidence than B. The sample means are the same, but when the data points are tightly gathered around the sample means, it is less likely that the differences in sample means occurred by chance.

C provides stronger evidence than B. When the within-group variances are the same, then sample means that are farther apart are more suggestive that the population means are difference.

What about A v. C? It's hard to tell. What we need is a test statistic that will allow us to measure whether the sample means were likely to follow their particular pattern if the populations all had the same mean, given the within-group variances observed in that example.

Motivating test statistic with graphics

Sums of Squares.4.Part 4.mp4

Question 5: What characteristics of a data set should make us more likely to reject the null hypothesis that all the groups come from populations with the same mean?

Show answer

Sample means that are farther apart; small within-group variances.

Equal means model

Sums of Squares.5.EqualMeansModel.mp4

Question 6: What do we subtract from each data point in order to form the residuals when calculating SST?

Show answer

The overall mean, ignoring which data point comes from which group.

Separate means model

Sums of Squares.6.SeparateMeansModel.mp4

Question 7: SSW is the numerator of which quantity that we've previously discussed?

Show answer

pooled sample variance

More on SSW

Sums of Squares.7.MoreOnSSW.mp4

Question 8: Earlier in this lecture, we identified two features of the data that we hoped would be reflected by the test statistic. Which of those two features is quantified by SSW?

Show answer

SSW reflects the within-group variance.

SST greater than or equal to SSW

Sums of Squares.8.SSTgreaterthanorequaltoSSW.mp4

Question 9: How does this value, 96, relate to the other numbers we have calculated so far?

Show answer

SST - SSW = 160-64 = 96

Discrepancy between SST and SSW

Sums of Squares.9.DiscrepancySSTandSSW.mp4

Question 10: SSB + SSW =

Show answer

SST

Extra sum of squares F-test

At the very end of this video, when I am showing how to run the test in R, I should have written "group1" inside the aov command rather than "group".

Sums of Squares.10.ExtraF-Test.mp4

Question 11: What is the F-statistic for the little example we used to illustrate SST, SSW, and SSB?

Show answer

F = (SSB/(I-1)) / (SSW/(n-I)) = (96/(2-1)) / (64/(6-2)) = 96 / 16 = 6

Whether this is a large or small value of the F-statistic (in other words, whether this will lead to a large or small p-value) depends on (I-1) and (n-I), as those "degrees of freedom" determine which version of the F-distribution is the reference distribution.

The next lecture will demonstrate how to organize these sums of squares into an ANOVA table and find a p-value.