Linear Regression and Correlation Homework
ANSWER KEY
1. a. For every increase of one mpg, the price of a car decreases by 0.7 thousand (or 700) dollars.
b. A car that gets zero miles per gallon costs $45,000.
c. 45 – 0.7(25) = 27.5 thousand dollars
d. residual = actual – predicted
actual = 28
predicted = 45 – 0.7(20) = 31
residual = 28 – 31 = -3
The line overestimates the data at this point.
2. A residual is the difference between the actual y-value, and the y-value that is predicted by the linear model. (residual = actual – predicted)
3. a. The SSE is the sum of squared errors. It is calculated by finding the residual for each data point, squaring each residual, and then adding up all of the squared residuals.
b. SSE tells you the total amount of error for the regression line. The smaller the SSE, the closer the regression line is to the actual data. The larger SSE is, the further the line is from the data. The smaller the SSE, the better the line is.
c. The least squares regression line, which is the line given by standard statistics programs, is the line with the smallest possible SSE for that data set. It is the best-fitting line for that data.
4. It will look like a random scatter of points. No clear pattern.
5. a. Strong negative correlation.
b. Strong positive correlation.
c. Very close to 0. Essentially no correlation.
d. Moderate negative.
e. Moderate positive.
6. “Correlation does not imply causation” means that you should never assume that there is a cause-and-effect relationship when two variables are correlated with each other. For example, as “amount of ice cream sold” increases, “number of shark attacks” also increases. However, you should not conclude that ice cream makes sharks attack people. You should always look for another explanation for why two variables are correlated. In this case it is weather.
(This applies even in situations where you think there might be a real connection. For example, the more time spent on SAT review courses, the better students do on the SAT. The review courses might be causing them to do better, but you have to consider that students who take review courses are also more motivated. Motivation explains a significant part of the correlation.)
7. a. False. Only when the residual plot is a random scatter of points.
b. False. Non-linear patterns can be highly correlated.
c. False. Correlation does not imply causation.
d. False. Linear describes the nature of the correlation, not whether there is a cause-and-effect relationship.
e. True.