Module 12

Confidence interval

Introduction

1. Definition

2. Obtaining confidence interval estimates for population means on jamovi

3. Confidence intervals in relation to a statistical test on jamovi

Introduction

Confidence interval (CI or C.I.) is a interval estimate for a population parameter.
CI is most common-used for the population mean μ. But many researchers also use it on parameters such as correlation, effect size, etc. In principle, it can be applied to any population parameter.
Reporting the CI based on the sample data complements the p value in null hypothesis significance test (NHST) by providing the reader more information about variability and uncertainty.

1. Definition

A confidence interval consists of
- an upper limit
- a lower limit
- an associated probability
The upper and lower limits are, in fact, random variables depending on sampling (as a "trial" under the probability-theory idea). Once a sample is obtained, we can compute the estimated values (also simply known as "estimates" in statistics) for the limits based on the sample data and the associated probability.
In the case of CI for the population mean, the point estimate of the population mean (usually equal to the sample mean M) is always at the mid point between the estimates of the upper and lower limits.
This is also how the CI for μ is computed: upper limit = M + width, lower limit = M - width, where width = (critical value of t) x (standard error of the mean).
The most commonly-used associated probability is 95%. The probability is often formally denoted as (1 - α), where α is the level of significance (often set at .05) because CI can be interpreted in the context of NHST.
The statement "Based on our sample, the 95% CI of the population mean is (30, 50)" can be formally understood as "We have applied the formula of CI for population mean on our sample, and we found that the upper and lower limits are, respectively, 50 and 30. If the same CI formula is applied to future samples, 95% of such samples will produce a 95% CI that contains the population mean."
However, a 95% CI for μ is often practically interpreted as "This interval has 95% of chance of containing the population mean", which serves the purposes.

2. Obtaining confidence interval estimates for population means on jamovi

After opening the data file in jamovi, go to Exploration and then choose Descriptive.

Then, select the variable for which you would like to compute the CI for population mean. Just drag the variable from the left window to the right window under "Variables". You can also use the arrow button in between.
In this example, we choose the variable sleep (average sleep duration, in number of hours).
You'll see some basic descriptives shown on the Results panel on the right.

Next, on the left panel under "Statistics", go down and check the checkbox of "Confidence interval for Mean".
After checking the box, you can also adjust the associated probability (set to 95% by default).
You can then see the upper and lower limits displayed on the right-hand-side Results panel.
In this example, the 95% CI for the population mean sleep duration (in hours) is (7.28, 7.45).

When obtaining the descriptives, jamovi allows you to split the data based on the levels of another nominal (categorical) variable.
This comes in handy when you want to obtain the confidence interval(s) for a specific group (or multiple groups) specified in the nominal variable.
For example, if we would like to obtain the 95% CI for mean separately for females (Gender = 0) and males (Gender = 1).
We can select Gender and put it under the window "Split by".
On the Results panel on the right-hand-side, all the descriptives will be split based on the groups under the Gender variable.
Specifically, the 95% CI will for each value of the Gender variable (0=female, 1=male) will be shown.
In this example, the 95% CI for the population mean sleep duration for female is (7.22, 7.46) and that for male is (7.27, 7.50).

3. Confidence intervals in relation to a statistical test on jamovi

We can ask jamovi to generate confidence interval estimates for paramters other than just the population mean when we conduct hypothesis testing.
Below is an example of a one-sample t test, testing the hypothesis that the population mean number of sleep duration is different from a constant.
In principle, confidence interval estimates can be applied to parameters in other statistical tests. Those that we have already gone through in earlier modules include mean difference in an independent-samples or paired t test, Pearson's correlation, etc.

Suppose we set up the null hypothesis as "the population mean number of sleep duration is equal to 8 hours", or H₀: μ = 8, for a two-tailed test, i.e., H₁: μ ≠ 8.
First, we go to T-tests -> One Sample T-test
Then, we select Sleep as the dependent variable. Under Hypothesis, set the Test value as 8. Make sure it's a two-tailed test with the option of "≠ Test value" selected.
You'll then see the traditional one-sample t test results on the right, with the test statistic and p value (t(999) = -14.8, p < .001).
This is how you'd normally conduct a one-sample t test on jamovi, as described in the previous module.

In addition to the t test statistic and the p value, we can obtain a confidence-interval estimate of the mean difference. As a population parameter, it is the difference between the population mean sleep duration (estimated based on the sample mean sleep duration) and the test value (8 hours).
To obtain the confidence-interval estimate of this mean difference, check the boxes next to Mean difference and then Confidence interval under Additional Statistics.
The point estimate for the mean difference is -0.639, which indicates that the sample mean (7.361; not shown here) is smaller than the the test value (8) by 0.639 hours.
The lower and upper limits of the 95% confidence interval are, respectively, -0.723 and -0.554. As in other confidence interval, the point estimate (-0.639) is in the middle between these two values. Note that the associated probability is set to 95% by default. You can be change it to be 90%, 99% or other values.
Below are the interpretations of this mean difference:
- Using the specific method for calculating the 95% confidence interval, we've got an interval estimate of (-0.723, -0.554) for the mean difference in sleep duration. If we use the same method for calculating confidence intervals in a future sample , there is a 95% chance that the resulting interval estimate will cover the true population mean sleep duration.
- Because the 95% confidence interval of the mean difference does not cover zero (the whole interval estimate is below zero), the mean difference was significantly different from zero at α = .05. In fact, the same conclusion can be drawn for all values outside of the 95% confidence interval (e..g, -0.8, -0.3, 0.1, etc.). A more direct way to describe this is that "mean sleep duration was significantly different from 8 hours per day".
- Note that the confidence interval estimate tells us that, based on the sample statistics, the mean difference should be around 0.554 and 0.723, which means that the mean sleep duration was less than 8 hours by more than half an hour to around 40 minutes.

Another parameter for which we can obtain the an interval estimate is the effect size.
Here, we refer to Cohen's d, which is defined as the mean difference divided by the standard deviation. Therefore, it can also be understood as a population parameter (because both "mean difference" and "standard deviation" exist at the population level.
By checking the box named "Effect size" and the "Confidence interval" beneath it, we can see that the effect size in Cohen's d was estimated to be -0.469, with the 95% confidence interval being estimated to be (-0.534, -0.404).
With this confidence interval estimate, we can then interpret the effect size -0.469 as quite likely to be around "medium" (0.5 = medium). Without the interval (-0.534, -0.404), the point estimate of the effect size (-0.469) could be the result of an unreliable estimation (e.g., its 95% confidence interval could be (-0.869, -0.069), which could be a range between "minimal and large".

Note that the above technique applies to other hypothesis tests, such as two-sample (independent or paired) t tests, correlation analysis, regression analysis (e.g., confidence interval for the slope).

Page updated

Google Sites

Report abuse