Log transformations for linear regression

It is important to understand how taking the log of the predictor(s) and/or the outcomes can help make linear regressions assumptions more true. The most important skill is to know that taking the log can help, so that you can try taking the log and see if graphics suggest that linear regression assumptions are now true. This will be illustrated in this unit's R module.

The material below explains the ways that we can interpret linear regression results on the original scale even if the regression was run on the log scale. It is good for you to know that this can be done - it is an advantage of taking the log rather than some other function of the data in order to make the regression assumptions more true. This is parallel to the way that we can interpret t-test results on the original scale even if the t-test was run on the log scale. However, for our current purposes, it is not crucial for you to be able to interpret results on the original scale after running the regressions on the log scale, and the derivations take some time, so I am making them optional.

You can also view all the videos in this section at the YouTube playlist linked here.

The total length of these videos is 30 minutes.

Using Log Transformations for Linear Regression

LogTransformationsLinearReg.1.Intro.mp4

Question 1: How do you decide if log transforming the data on one or both axes is appropriate for a simple linear regression?

Show answer

Transform the data on one or both axes and see if it looks more linear.

Log(Y) vs. X

LogTransformationsLinearReg.2.Log(Y) vs. X.mp4

Question 2: Which of the following is the correct interpretation of log(Y) vs X data on the original scale?

The median of Y is multiplied by e^β₁ when X is additively increased by 1
The mean of Y is multiplied by β₁ when X is additively increased by 1

Show answer

The median of Y is multiplied by e^β₁ when X is additively increased by 1.

Y vs. Log(X)

LogTransformationsLinearReg.3.Y vs. Log(X).mp4

Question 3: Which of the following is the correct interpretation of Y vs log(X) data on the original scale?

The mean of Y is additively increased by β₁*log(k) when X is multiplied by k
The mean of Y is additively increased by β₁*k when X is additively increased by k

Show answer

The mean of Y is additively increased by β₁*log(k) when X is multiplied by k.

Log(Y) vs. Log(X)

LogTransformationsLinearReg.4.Log(Y) vs. Log(X).mp4

Question 4: Which of the following is the correct interpretation of log(Y) vs log(X) data on the original scale?

The median of Y is multiplied by k^β₁ when X is multiplied by k
The mean of Y is multiplied by k^β₁ when X is multiplied by k

Show answer

The median of Y is multiplied by k^β₁ when X is multiplied by k.

Yay, you did it!

Page updated

Report abuse