confidence intervals
Group 0 - Albert Ku
Group 0 - Albert Ku
Overview
Suppose we want to use sampling to estimate an unknown parameter (e.g. mean, variance etc) of a population. We can calculate from a sample the so-called confidence interval (CI), which is a range of estimates for the parameter.
In statistics, you often hear the term like "the 95% confidence interval for the mean". 95% is called the confidence level. It turns out that confidence intervals and levels are often misunderstood. One main objective of this shiny app is to illustrate the correct interpretation of the confidence.
In this shiny app, we will also investigate how to compute the CI to estimate the population mean in two common scenarios of sampling:
The population is normally distributed and the population variance is unknown
The population distribution is unknown and the population variance is unknown
There are two main methods of computing CI:
Using normal distribution
Using t-distribution
With each scenario, there are two factors that may affect your choice of distribution:
Sample size (n)
Confidence level (a)
You need to use to the shiny app to find out the suitable choice of distribution for computing CI in each of the possible situations.
Instructions
In the Shiny app, you can first select the population distribution:
Normal distribution with mean zero
Bimodal distribution with mean zero
You can choose the variance to change the shape of the selected population distribution. (Note: the variance will not be used in the calculation of the CIs)
Then choose the sample size (n) and the number of repeated samples (k).
The confidence level (a) can be adjusted between 0.9 and 0.999.
There are 4 plots in the Shiny app:
The plot of population distribution
The histogram of sample means using frequency density
The plot of all CIs using normal distribution from the repeated samples in the form of vertical line. The midpoint of each vertical line is the sample mean corresponding to the respective CI. The CIs that contain 0 are in blue color and the rest are in red color. The percentage of CIs that contains 0 is shown beneath the plot.
The similar plot of all CIs using t-distribution.
The following is the link to the Shiny app. Use it complete Task 1 and 2 below:
Task 1 - Answer the following question:
Which of the following statements is/are the correct interpretation of a 95% CI that estimates the mean of a population?
(A) There is 95% chance that the population mean is contained in this CI.
(B) For a future random sampling of the same size, there is 95% chance that the sample mean is contained in this CI.
(C) For repeated sampling of the same size, the proportion of their CIs that contains the population mean tends to 95% as the number of repeated sampling goes to infinity.
(D) For a future random sampling of the same size, there is 95% chance that its CI contains the population mean.
Answer
As you can see in the shiny app, CI is computed for each random sample whereas population mean is theoretically a fixed (yet unknown) quantity i.e. the random process is for generating a CI from a random sample. Therefore, (A) is wrong because it does not make sense to talk about the "chance" of obtaining a "fixed" population mean.
As for (B), it is also wrong because any two random samplings are independent so it also the CI of one sample should not give you any information about the sample mean of another sample.
The only correct interpretations are (C) and (D). (C) is the essentially equivalent to (D) when we apply the law of large number.
Task 2 - Complete the diagram
Using the shiny app, find the suitable choice of distribution for computing CI in each of the following cases in the diagram below. Also, find the appropriate value of n (sample size) that separates the large and small sample size, and the appropriate value of a (confidence level) that separates the high and low confidence level.