A census collects data about a whole population. Everyone is included. These are expensive, time-consuming, difficult to run, and quite rare.
Most of the time, statisticians will investigate a sample of the population instead.
It is important that a sample is fair. Everyone in the population should have the same chance of being selected to be in the sample.
There are four key groups we should think about, starting with the largest.
1) The TARGET POPULATION. This is everyone in the place or group being investigated. It could be restricted to a certain gender, age group, location or ethnicity.
2) The SAMPLE FRAME. These are the members of the population that we are able to contact. We would hope that this is most of the population; but there are some members of the population who are not in the sample frame. Sometimes, the sample frame is carefully chosen in a way we hope is representative of the wider population.
3) The SAMPLE. These are the people who we select to try to collect data about. They are a smaller group of people, chosen from the sample frame.
4) The RESPONDENTS. We took a sample from the sample frame to represent the target population. However, for various reasons, not everyone in the sample will respond.
At each step through this list (target population > sample frame > sample > respondents), we need to ask if the smaller group is a fair representation of all of the members of the population.
When the sample frame is not all of the population, the people who are left out might give different answers to our questions, compared to the sample frame. A common choice of sample frame is telephone directories; but are people with no landline different to people with a 'listed' number?
How the sample is taken from the sampling frame could effect whether the sample is a fair representation of the population.
Some possible sampling methods are discussed below.
A simple random sample chooses a random collection from the sample frame.
Everyone in the sample frame has the same chance of being selected for the sample.
Cluster sampling takes a sub-group from the sample frame based on how it is grouped. For instance, it could be just people in the phone book with surnames beginning with 'J'.
If the population was grouped randomly, then we can expect a cluster to be representative.
However, there might be an underlying pattern that makes it different.
A systematic sample is taken by using a preset pattern and taking people from the sample frame systematically. This might be a rule as simple as "every 50th person on the list".
If there is a pattern in the data, the systematic sample might over-represent (or under-represent) some subset of the population.
A stratified sample takes a specified number of people from each group in the sample frame we are interested in. The sample is randomly chosen from within each group.
This is typically the same number from each group.
This is useful when making a comparison between two groups. However, summary statistics about the whole population cannot easily be calculated.
Quota sampling involves giving quotas to be filled in certain subgroups (to match known proportions in the whole population).
This can cause problems when samples are no longer 'random'. Bias can be introduced when interviewers go looking for subgroups only in the places where they expect to find them.
A convenience sample is taken in a place and time that suits the interviewer, without regard to whether the sample is fair.
Sometimes convenience sampling is justified as cluster sampling.
Respondents (hopefully from the sample frame) choose to be part of the sample. This completely ruins any chance that the sample is representative of the whole population.
For example, the survey shown here was on the nzherald.co.nz website beside the article in the 'Drink Driving' report. The results are for a self-selected sample, and are in no way representative of all New Zealanders.
Phone-in or text message responses to polls on TV programs are also unrepresentative, since the sample is self-selected.
It is unlikely that everyone in the sample will be a respondent. They might not respond because:
Some statistical reports identify a response rate. This is the percentage of respondents in the sample. For instance, is 675 people out of a sample of 1000 people answer a survey question, the response rate is 675÷1000 = 67.5%.
Low response rates are a cause for concern.
About 15 percent of surgeons have alcohol abuse or dependency problems, a rate that is somewhat higher than the rest of the population, according to a new survey.
The researchers, led by Dr. Michael Oreskovich at the University of Washington, sent out a survey to more than 25,000 surgeons.
The questions asked about work, lifestyle and mood, and several were used to screen for alcohol abuse or dependency.
Overall, 15 percent of surgeons showed signs of alcohol problems. Other studies have estimated that, among the general population, the number is about nine percent.
One of the limitations of the survey is that only about 7,200 surgeons out of the 25,000 queried responded to the survey.
Oreskovich said it’s possible that the percent of surgeons with alcoholism is underestimated in this study, “because I think the folks who are less likely to respond may have shame and guilt and fear associated with their alcohol abuse and dependence that they don’t want to report on the survey.”
187 wordsThere were approximately 145,000 surgeons in the United States of America in 2012.
The sample frame is probably very close to the target population, as there will be a register of surgeons (in the USA). The sampling method has not been given; a simple random sample would be most appropriate.
The response rate is very low (about 29%); and the researchers acknowledge that the true proportion of surgeons showing signs of alcohol problems could be higher still.
Along the way, if you notice something concerning about what you read in the report, make a note. A bit of background research might be needed.
In the report above, responses to questions were used to see which surgeons "showed signs of alcohol problems". This doesn't seem like a proper diagnosis of alcoholism (an addiction), which would probably require a proper diagnostic test (of which, it turns out, there are several). The report writer might have used "to be alcoholics" in the title since it is more shocking than "to show signs of alcohol problems", but these are not the same thing.
Answer the Sampling methods focus questions about the Caffeine report: