Remember that when you collect information from every unit in a population, it is called a census. In doing a census, we can be certain that the numbers we have calculated really do represent the entire population. But, because a census is often impractical, we generally take a representative sample of the population, and use that sample to try to make conclusions about the entire population. The downside to sampling is that we can never be completely, 100% sure that we have captured the truth about the entire population.
For example, imagine taking a random sample of 100 from a large population. Put those back and choose another sample of 100, repeating many times. Each of these samples of size 100 will include a different combination of 100 members of the population. Thus, each sample will result in different statistics. This natural difference between various samples is an expected random sampling error. To take this into account, researchers generally report their findings to have a margin of error or to be within a certain range of possible values. This range is called a confidence interval. For example the President's approval rating might be reported as, "The approval rating for the President is 43.2%, with a margin of error of ±3%." Which could also be reported as, "The approval rating for the President is between 40.2% and 46.2%."
Using a statistic to make a conclusion about a population is called statistical inference. This course is an introduction course, so we will only briefly touch on this idea. In a future statistics class, you will learn much more about statistical inference and calculations. It is important to note that statistical conclusions are meaningless when poor sampling techniques have been used. If the data was collected from a voluntary response sample, or you had a low response rate, or an incomplete sampling frame was used, then don't waste your time performing inference on your statistics. Random sampling error is the only type of error or bias that the margin of error accounts for.
Once a statistic is calculated for a sample, it is used as an estimate for what the actual parameter might be. We do not know whether our statistic is close to the population parameter, or if it is too high, or too low, so we build our interval around the statistic. We add the margin of error to, and subtract the margin of error from, our statistic. We then report this range of values as our confidence interval, the interval that we are fairly confident that the true parameter must be within. In a more formal course you will learn how to calculate the margin of error more precisely, and for various levels of confidence (such as 90% or 99% etc.). In this course we will use a simple formula that estimates the margin of error for a 95% confidence interval. We will also make a 95% confidence statement, which explains our conclusion regarding the population parameter in context. The formulas for an estimated 95% margin or error and confidence interval are:
[Figure2]
*note: In order to make a smaller margin of error, and therefore a more narrow confidence interval, one must increase the size of the sample.
Once you have found the range of numbers for your confidence interval, you are going to state your conclusion in context. Such a statement is called a confidence statement. The confidence interval refers to the population - not the sample. We are 100% certain of our sample statistic. It is the population parameter that we are estimating. Writing a confidence statement can be kind of confusing, so you can just use the following template:
Example 1
A random sample of 125 union members was conducted to see whether or not the union members would support a strike. Sixty-four of those surveyed said that they would support a strike unless safety conditions were improved. Identify
a) population of interest
b) parameter of interest
c) sample
d) statistic
e) margin of error
f) 95% confidence interval
g) confidence statement.