Scroll down for the answer key.
You may need to download the file (bottom of page) to view images.
Describe a distribution in terms of:
shape: skew, outliers, and modes center: mean and median
spread: range, standard deviation
1. Describe each distribution in terms of shape, center, and spread.
Understand standardized scores as measures of distance from the mean
2. Students in a large math class just got their grades back from the final. The teacher calculated z-scores for each student, in addition to their letter grade. Write a sentence for each student’s score, in context.
a. Bob: z = -1.23
b. Jane: z = 0.43
c. Sherlock: z = 3.2
Calculate normal distribution probabilities
3. For the math test in question 2, the distribution was approximately normal, with a mean score of 84, and a standard deviation of 5.
a. What percentage of people scored below z = -1.23?
b. What percentage scored between z = 2 and z = 3?
4. a. What z-scores mark the middle 95% of a normal distribution?
b. What z-score marks the bottom 95% of a normal distribution?
c. What z-score marks the bottom 5% of a normal distribution?
d. What z-score marks the top 10% of a normal distribution? The bottom 10%?
Be able to determine if variables appear to be independent in a contingency table
5. In the table below, are age and employment status associated with each other?
To help answer this question, figure out which one of the following is true:
Be able to describe a bivariate trend in terms of direction and strength
6. Describe each of the following scatterplots in terms of direction and strength.
Write the equation of a regression line, interpret the slope in context.
Calculate and interpret r2 in context.
7. The output below shows the results of a linear regression for predicting smoking rate in the US from 1965 to 2002.
a. Write the equation of the regression line, and interpret the slope of the regression line in context.
b. Calculate r2, and write a sentence interpreting r2 in context.
Understand that correlation does not imply causation. Identify lurking variables.
8. There is a strong positive correlation: nations with many TV sets have high life expectancies. Could we lengthen the lives of people in countries with low life expectancies by sending them TV sets? Justify your answer.
9. Are grades and TV watching linked? Children who watch many hours of television get lower grades in school on the average than those who watch less TV. Explain clearly why this fact does not show that watching TV causes poor grades. In particular, suggest some other variables that may be confounded with heavy TV viewing and may contribute to poor grades.
Explain the concept of regression to the mean.
10. Explain the Madden curse.
ANSWER KEY
1. The distribution of Ages is skewed right. The balance point method shows that the mean is around 38 to 40. The minimum is 20, the max is 80, so the range is 80-20 = 60. There is a mode at 32. There are no outliers.
The distribution of Beck Depression Inventory Scores is bimodal and roughly symmetrical, with modes at 10-19 and 50-59. The min to max is from 0-9 to 60-69, for a range of about 60. The balance point indicates a mean around 30. There are no outliers.
The distribution of Pulse Rates is approximately normal, with a balance point at 80-84. The standard deviation is approximately 10. There are no outliers.
2. a. Bob’s test score is 1.23 standard deviations below the mean.
b. Jane’s test score is 0.43 standard deviations above the mean.
c. Sherlock’s test score is 3.2 standard deviations above the mean.
3a. normalcdf(-999, -1.23) = 10.9%
b. normalcdf(2, 3) = 2.14%
4. a. 1.0 – 0.95 = 0.05, and 0.05/2 = 0.025, and invNorm(.025) = -1.96. By symmetry, the other z-score is 1.96
b. invNorm(.95) = 1.645
c. invNorm(.05) = -1.645
d. invNorm(.90) = 1.28. By symmetry, the other z-score is -1.28.
5. P(Employed | Older) = 355/474 = 75%
P(Employed | Younger) = 208/316 = 66%
Because older people are more likely to be employed than younger people, there is an association between employment status and age.
6. Sq. ft. and price have a moderate to strong positive correlation.
Hours watching TV and Hours doing homework have a moderate to weak negative correlation.
7. a. y = 1372 – 0.67x
b. r2 = 773/839 = 92%
92% of the variability in smoking rate in the US is explained by time in years.
Note: This is mathematical “explanation.” It doesn’t tell us anything about why smoking rates went down over these years. It just tells us that we can fairly accurately predict smoking rate by year.
8. No, we can’t lengthen life spans by sending TVs. The lurking variable is wealth. In places where people can’t afford TVs, there isn’t much money for things like infrastructure, healthcare, nutrition, etc., while in wealthier counties you have TVs along with the things that money can buy that extend the average life.
9. There is an association between watching TV and getting lower grades. However, this could be explained by other factors. For example, students who are unmotivated are not necessarily going to study just because they stop watching television. Lack of motivation can explain not studying, as well as TV habits.
10. The Madden Curse is an example of regression to the mean. The Madden Curse is the tendency for football players who are featured on the cover of EA Sports football video game to do worse in the season after they are featured than in the season before. However, this is only because the year before they were featured was an unusually good year for them. That’s why they were featured on the cover. What happens the next season is that they are likely to perform at a level closer to their own typical, or average, level of performance. If we assume that a player has a certain average level of ability, their results from season to season will vary around that mean. Unusual results are, by definition, unusual. It is not reasonable to expect them to be repeated.