Log Transformations for t-tests

The total length of the videos in this section is approximately 22 minutes. You will also spend time answering short questions while completing this section.

You can also view all the videos in this section at the YouTube playlist linked here.

In my experience, this particular topic can be hard to grasp the first time you hear it, even in person. So, please think carefully but also don't worry if you are still confused, and note that actually implementing this method for a data set is much easier than explaining the idea.

Why use log transformations?

LogTransformations.1.Why Log Transformations for Parametric.mp4

Question 1: Which of the following scenarios would benefit from a log transformation? Please check all that apply.

Show answer

All the answers except temperature. First of all, if you are measuring weather, temperature could be negative. (And, actually, money can sometimes be negative, too (such as a business's profit), so if you didn't check the second option for that reason, you are correct). Second, when a measurement has to be positive but always has values extremely far from zero, then you won't necessarily have right-skewedness. Examples include height and body temperature.

Interpreting a difference in mean logs, on the original scale

LogTransformations.2.Diff of Means.mp4

Question 2: In statistics, which do we mean when we say "log"?

Show answer

Base e.

Interpreting a confidence interval for the difference in mean logs, on the original scale

LogTransformations.3.Confidence Intervals.mp4

Question 3: Suppose that I am comparing the incomes of people who took a statistics course vs. those who did not. I log all of the incomes and estimate that the difference in mean log incomes is 0.23. Suppose that the median income among those who did not take a statistics course is $70,000. Approximately what is the median income among those who did take a statistics course?

Show answer

If the difference in mean log incomes is 0.23, then the ratio of (median income among those who took statistics) to (median income amount those who did not) is approximately e^0.23 = 2.71^ 0.23 = 1.26. Therefore, if the median income among those with no stat class is $70,000, then the median income among those who did take stats is approximately $70,000 * 1.26 = $88,200.

Note that the function "e^x" is also called exponentiating.

Question 4. This is a continuation of the previous question. Suppose that I calculate the following 95% confidence interval for the difference in mean log incomes: (0.21, 0.25). Provide an approximate 95% confidence interval for the ratio of (median income among those who took statistics) to (median income amount those who did not).

Show answer

The lower bound of a 95% confidence interval for the ratio of medians is e^0.21 = 1.23. The upper bound of a 95% confidence interval for the ratio of medians is e^0.25 = 1.28. Note that this interval is not symmetric around 1.26. When you obtain an interval by adding and subtracting the same number from an estimate, the estimate ends up in the middle of the interval. However, when you obtain an interval by exponentiating another interval, your estimate will not be in the middle of the interval.

You did it! Again, we expect that this takes some time to digest.

During this tutorial you learned:


Terms and concepts:

Log transformations, ratio of medians 


Functions in review: 

exp(), log()