Sampling Variability and Measures of Dispersion
The total length of the videos in this section is approximately 44 minutes, but you will also spend time answering short questions while completing this section.
You can also view all the videos in this section at the YouTube playlist linked here.
Polls
Question 1: How many people should you poll in California (population approx 38 million) to generate a sample as precise as your estimate based on 500 people from Oklahoma (population approx 3.8 million?)
Show answer
500. This is explained in the next video.
It's like a bowl of soup
Question 2: Which of the following affect your ability to assess the saltiness of the soup? Check all that apply.
whether the soup is mixed well
whether the soup is pureed or chunky (with different pieces ending up in each spoonful
size of spoon
size of pot
Show answer
The first three options, but not the fourth.Â
Question 3: Which of the following affect your ability to precisely estimate the election outcome based on a poll? Check all that apply.
whether the sample is representative
whether the population is homogeneous (people are similar to each other)
size of sample
size of population
Show answer
The first, second, and third, but not the fourth.
In these videos, we are using the word "precision" as a substitute for the words "uncertainty," "error," or "variance." No matter what we call it, we are discussing how much an estimate varies over different possible samples that you might draw from the population.
Question 4: When you read about an election poll, approximately how many people do you think were typically polled?
Show answer
It's almost always about 1000. That's not many, but it's enough if the sample really is representative. Election polls are traditionally telephone polls, and it is increasingly challenging to avoid selection bias in telephone polls, since fewer people have landlines, among other problems.
I expect that the discussion of measures of center is a review for many. Please skim that video as appropriate, but don't skip the videos on dispersion or variance, even though this is likely not your first statistics course.
Centers
Question 5: Suppose that we ask each student in a physical education class to run a mile, and we record the times. We will likely obtain a right-skewed distribution, with some students finishing the mile at about the same time as each other and some students taking much longer, with their times spread out on the right of the distribution. Which will be larger, the mean or the median mile time?
Show answer
The mean. Means are influenced by extreme values much more than medians. When a distribution is right-skewed, the mean is generally higher than the median.
Dispersion
Question 6: If we have a right-skewed distribution, such as obtained from a class of students running a mile, will the 25th percentile or the 75th percentile be closer to the median?
25th closer to the median
75th closer to the median
Same distance from the median
Show answer
25th. If the lower times are clumped together, the students running between the 25th and 50th (median) percentiles may have very similar times, in a narrow range. If the longer times are spread out, as in a right-skewed distribution, the students taking longer than the median time may have a wide range of times.
Mean/Median Absolute Deviation
Question 7: Calculate the median absolute deviation of the following values: 3,5,8
Show answer
2. The median is 5. The absolute deviations are 2, 0, and 3. The median of the absolute deviations is 2.
Variance, mean, and notation
Question 8: How is the variance related to the standard deviation?
Variance = Standard Deviation
Variance = (Standard Deviation) * (Standard Deviation)
Variance * Variance = Standard Deviation
Show answer
Variance = (Standard Deviation) * (Standard Deviation)
Question 9: What is the variance of the numbers 3, 5, and 10?
Show answer
8.67
Variance and the normal distribution
Question 10: If Y follows a normal distribution with mean 10 and variance 4, 95% of values lie between two numbers. What is the lower number?
Show answer
6, because 10 - 2 * sqrt(4) = 10 - 4 = 6.
That's it!
During this tutorial you learned:
What affects the variance of a sample estimate
Ways to describe the center of data
The relationship between median and mean when the distribution of the data is skewed
Ways to describe the spread of data
How to calculate the mean/median absolute deviation
Definition of mean, variance, and standard deviation of a population
The relationship between variance and standard deviation
About a normal distribution, and how to describe it
Terms and concepts:
Center, mean, median, mode, skewed distribution, standard deviation, variance, range, min/max, quartiles/percentile, interquartile range (IQR), mean/median absolute deviation, distribution, and normal distribution