Predicting with the correlation coefficient

Post date: Sep 03, 2017 10:53:38 PM

Suppose you have collected data on height (h) and weight (w), and this is what you know:

average height = 70 inches, SD = 3 inches

average weight = 162 pounds, SD = 30 pounds,

r(h,w) = 0.47

(Let's ignore the fact that I personally am noticeably shorter than the average in height, but almost one standard deviation above the average in weight.)

If you knew that a person was average height, what would be your best guess as to their weight? Average weight, of course. Now suppose you knew that the person was 3 inches taller than average, i.e., 73" tall. How much do they weigh? Probably more than the average, i.e., more than 162 lbs. But what is the best estimate of their weight? Well, 3 inches is one standard deviation above average in height. The correlation coefficient tells you that, on average, a one standard deviation increase in height is associated with 0.47 SDs increase in weight. The standard deviation of weight is 30 lbs, so we are expecting that a person 73" tall will be 30*0.47 = 14.1 lbs heavier than the average, which is 176.1 lbs. It's that simple.

Note that if the correlation between height and weight were a perfect 1.0, then a person 1 SD above the mean in height would be expected to be 1 SD above the mean in weight, which means they would be 192 lbs. Because the actual correlation is less than perfect, the best estimate is considerably less than 192. This is what is meant by regression to the mean.

Let's try it again. I am 5'8". So I am 2" shorter than the average. How much do I weigh? Well, I'm 2/3 = .67 of a standard deviation below the mean. Multiplied by 0.47, I should be .313 standard deviations below the mean in weight. Since a standard deviation in weight is 30 lbs, that means I should be 9.4 lbs below average, which would be 152.6. Since I'm actually closer to 190, I am clearly an outlier, an exceptional human being. This I already knew.