5: Inference

What can we say about a POPULATION median using information about the SAMPLE?

A simple question, but hard to answer: What is the median weight of a male red-billed gull in Auckland in 2017?

The only way we could find the exact median weight would be to head out and measure all the weights. It would be almost impossible to have measured the weight of every male red-billed gull. Instead, we need to use a sample.

The median weight of the male red-billed gulls in a fair sample will be close to the population median.

Also, the larger the sample taken, the closer the sample median will be to the population median.

We can say that we are fairly certain that the population median weight is in an interval around the sample median weight.

The interval is defined by:

median ± 1.5 × IQR ÷ √n

The median weight of a male red-billed gull in the population is probably somewhere in this interval.

In this formula, n is the sample size for the group.

This is shown in the plot as the blue bar, where n = 100. The population median weight of a red-billed gull is probably somewhere in this interval.

Most of the time, it works pretty well

We can't be certain that the population median is in the informal confidence interval (ICI).

In big GULLS data set, lots of samples of 100 male red-billed gulls were taken. The sample median and sample IQR were used to draw an informal confidence interval (in blue) for each sample.

The red line shows the population median. We know that in this big data set about red-billed gulls, the median weight is 279 grams.

Notice that nearly all of the informal confidence intervals overlap with the red line. Most of the time, when we find an ICI, it will contain the population median.

NZ Grapher calculates and draws the ICI for us. It is still good to show where calculations came from in a statistical report, so showing the calculations used to find the ICI (using median ± 1.5 × IQR ÷ √n) could be part of a well-written Conclusion section.

There will be two calculations of an informal confidence interval for a population median - one ICI for each of the two groups being compared.

Sampling Variability

Every sample taken from a population will be different. In an assessment on this standard, even if you ask the same question as another student, your sample will look different. We need to acknowledge this in our statistical report.

This is a key concept, and it's a good idea to head a subsection in your Conclusion about sampling variability. You should also make mention of sampling variability throughout your report.

We took a stratified or simple random sample to make sure that our sample was unbiased - every member of the population has the same chance of being in the sample.

What we can't control is the variation in the sample. So it's possible that, through bad luck, our sample contains lots of large engine hatchbacks, even though there aren't many in the population. The larger the sample, the less chance this can occur.

In another sample, we would expect the middle 50% to be about the same. The lower quartile, median, and upper quartile, are likely to be similar in another sample. This means that we expect the ICI to also be similar too.

Expensive root √

If we wanted to have an ICI which was half the size, we need to take a bigger sample. (We can't change the IQR).

To halve the size of the ICI, we need to take a sample four times bigger.

Suppose we knew, from a sample of 200 cookies from the new factory that the median cookie weight was 12.0 grams, and the IQR was 0.95 grams. The the ICI for the medain weight of a cookie from the new factory is between 11.9 and 12.1 grams.

We could also write this as 12.0 ± 0.1 grams.

What if we wanted to know the median weight to one more decimal place? To decrease the ICI width by a factor of 10, we need to increase the sample size by a factor of 100. This means we need a sample of 20,000 cookies (1,645 packets of biscuits).

It would be quite expensive to get one more decimal place in our estimate of the population median! The square root in the formula means more accuracy will be expensive to achieve.

Suggesting a new sample

If the ICI for each of two groups are similar sizes, and we could take a bigger sample, we should take a bigger sample from both groups.

But what if we had a big ICI for one group and a smaller ICI for the other.

This plot shows a stratified sample of 100 cookies from each factory, and the CHIP_COUNT variable.

The ICI for the old factory is twice the size of the ICI of the new factory.

If we took another sample, and it was bigger, we might try taking a sample of 100 cookies from the new factory and 300 cookies from the old factory. This should help make the old factory chocolate chip count ICI smaller.

Give it a try!

Worksheet 8 gives more practice on informal confidence intervals for the population median.

Inf Worksheet 8.pdf