Learning Targets
I can determine whether a sample is representative of a population by considering the shape, center, and spread of each of them.
I know that some samples may represent the population better than others.
I remember that when a distribution is not symmetric, the median is a better estimate of a typical value than the mean.
A sample that is representative of a population has a distribution that closely resembles the distribution of the population in shape, center, and spread.
For example, consider the distribution of plant heights, in cm, for a population of plants shown in this dot plot. The mean for this population is 4.9 cm, and the MAD is 2.6 cm.
A representative sample of this population should have a larger peak on the left and a smaller one on the right, like this one. The mean for this sample is 4.9 cm, and the MAD is 2.3 cm.
Here is the distribution for another sample from the same population. This sample has a mean of 5.7 cm and a MAD of 1.5 cm. These are both very different from the population, and the distribution has a very different shape, so it is not a representative sample.
representative
A sample is representative of a population if its distribution resembles the population's distribution in center, shape, and spread.
For example, this dot plot represents a population.
This dot plot shows a sample that is representative of the population.
Often in this unit, the data sets are small enough that sampling is not necessary…but it will be easier to work with small data sets so that we can compare information from the sample of the same information from the population.
A young artist has sold 10 paintings. Calculate the MEAN and MEDIAN for each of these samples:
The first two paintings she sold were for $50 and $350.
At a gallery show, she sold three paintings for $250, $400, and $1,200.
Her oil paintings have sold for $410, $400, and $375.
Here are the selling prices for all 10 of her paintings. Calculate the MEAN and MEDIAN for all of the selling prices. Prices: $50, $200, $250, $275, $280, $350, $375, $400, $410, $1,200.
Were the measures of center for any of the samples close to the same measure of center for the population?
The price per pound of catfish at a fish market was recorded for 100 weeks.
What do you notice about the data from the dot plots showing the population and each of the samples within that population? What do you wonder?
If the goal is to have the sample represent the population, which of the samples would be good? Which would be bad? Explain your reasoning.
A sample with the same mean as the population is not necessarily representative, since it may miss other important aspects of the population. Example:
If the population for a question is all of the humans in the world and you use one person from each country as your sample, it may not actually be representative of the population.
Larger countries, like China, are under-represented since there are actually many Chinese people, but only 1 is included in our sample.
A smaller country, like Cuba, might be over-represented since it has fewer people living there, but is represented in the sample exactly the same as all of the other larger countries.
A representative sample is the ideal type of sample we’d like to collect…but if we do not know the data for the population, it will be hard to know if a sample we collect is representative or not. If we do know the population data, then a sample is probably unnecessary.
A representative sample means that the sample has a similar center, shape, and spread as the population data.
Why might it be important to get a representative sample rather than a more convenient sample?
It is useful if the sample looks similar to the population data. If not, we many miss some important information!