Visualizations to help understand statistics
Visualizations to help understand statistics
(a very intimidating formula the first few times you see it!)
Try thinking of it visually- We want the average area across all the rectangles formed when we move from the point of means to a particular point.
Wait!!.... That sounds even harder to understand! But once you see the visuals it will make more sense.....
Simple, fake data
Watch it a few times through to see how it works.
Lung Cancer Mortality and Percent with High School Education
Points scored and
Total yards of offense
Special thank you to Aimee Hong (MCAS '26) for coding the Covariance items above in Stata.
Aimee has also created a template do file for others (students or instructors) who use Stata to create similar Stata videos with any two variables you like. You will need to modify some items in the code but she leaves notes so you can see where to make the changes. (You will also need to download ffmpeg.)
That template code is found here.
These videos assume a general understanding of the Power of a Test
Here you can see what happens to the power of a test when the sample size increases and when the distance between the hypothesized mean and true mean increases.
Credit to Luke DeMartin (MCAS '23) for creating videos in this section.
What does this really mean?
Imagine three possible lines that we might choose as our prediction line. Which will we select?
Caution: Some texts call this sum of squared residuals (SSR)!
Luckily there is an analytical solution to finding the line that minimizes the sum of squared residuals!
Therefore, Stata doesn't have to really create all the possible lines and check to see which has the lowest sum of squared residuals.
Instead we use calculus and take first order conditions set them equal to zero. The algebra and resulting outcomes for simple regression (slope coefficient = Cov(X,Y)/Var(X) and every regression line goes through the point of means) will not be presented here.
Credit to Aimee Hong (MCAS '26) for creating videos in this section.
Below we present two of the three components of R-squared- NOTE: the missing component, the sum of squared errors is shown in the previous section!
We then have an overview video of how to construct R-squared.
The total variation around the mean of the dependent variable
Also, conceptually, the sum of squared errors if you can't use the information on X to predict Y
(i.e. if you predict the mean of Y)
The sum of squares due to regression.
The variation in Y that can be accounted after predicting Y
Be careful!
Some texts call this SSE or sum of squares explained!
R^2 = 1 - SSE/SST
(SSE video is shown in
the section above)
Credit to Aimee Hong (MCAS '26) for creating videos in this section.