STAT-C 225 (SURVEY SAMPLING)
Lesson 2. Sampling Techniques
Course Packs on Survey Sampling
Course : STAT-C 225 Survey Sampling
Program : BS Statistics
Year Level : 2
Semester : Second
Lesson 2.1 Survey Sampling Techniques
Survey Sampling Methods
There are many ways to create a sample for your survey. The sampling methods divide into probability sampling, sometimes called probabilistic sampling, and nonprobability sampling, or nonprobabilistic sampling. These two sampling strategies break down even further into several sampling methods.
[Probability Sampling is more reliable because it uses randomization to choose survey participants]
Probability vs. Nonprobability Samples
When selecting respondents for your sample, you have the choice between probability sampling and nonprobability sampling. Probability sampling is more reliable because it uses randomization to choose survey participants. The Office of Management and Budget’s Standards and Guidelines for Statistical Surveys for government agencies requires generally accepted probabilistic methods unless another method can be statistically justified.
Nonprobability sampling doesn’t use random selection. So, participants don’t have an equal chance of being included in the sample. It is often easier to undertake. Nonprobability sampling could even be the preferred method, usually in qualitative research. In some studies, researchers purposely choose certain participants because they can offer unique insights into a topic. However, for quantitative analysis, probability sampling will always be the preferred method, and statistical researchers will use nonprobability selection only for its practicality.
Probability Sampling Methods
In probability sampling, everyone in the sample frame has the same chance to be included in the sample. You can also calculate any member’s probability of being included in the survey. These sampling techniques are more reliable and allow researchers to make more accurate inferences about a population.
You can use several probability sampling methods, and each has distinct advantages and disadvantages. Those methods include:
Random Sampling
Simple random sampling is the most basic form of probability sampling. It involves just one step, and each survey subject is selected independently from the other members of the population or sample frame. A standard method of random sampling is assigning every individual in the sample frame a unique number. Then, a random number generator determines who will be in the sample.
The benefit of random sampling is that each member of the population has an equal chance of being chosen for the survey. This characteristic makes a simple random sample highly representative of the target population.
However, with larger populations, it’s hard to include every individual in the random selection process. Instead, you would draw participants from a sample frame, such as a list of email addresses or phone numbers. Any sample frame will eliminate some members of the population since it’s impossible to contact everyone. It’s time-consuming to conduct random sampling with larger populations and larger sample sizes. A biased sample frame can also skew your results.
Example:
You want to select a simple random sample of 100 employees of Company X. You assign a number to every employee in the company database from 1 to 1000, and use a random number generator to select 100 numbers.
[Systematic Sampling
Systematic sampling is a little easier than random sampling and is similar in reliability. In this method, you assign everyone in the target population or sample frame with a number. Instead of using a random generator, you systematically select candidates at regular intervals. For example, you could select every fifth number or every 20th number.
A systematic sample is highly representative. However, it’s not quite as random as using a random number generator. There’s also a chance that the list’s organization could compromise randomness. When sampling systematically, it’s essential that the list doesn’t have any hidden patterns. If you’re surveying people at a company, the list could divide employees by department and sequence them by rank. It’s a good idea to shuffle a list of names in alphabetical order or otherwise organized.
Example:
All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.
Stratified Sampling
Stratified sampling attempts to account for the demographics and traits of the larger population. It attempts to recreate the elements in the sample. For example, if you’re surveying college history majors, and you already know that 40% of history majors are female and 60% are male, you might want your sample to have the same proportions.
Before generating a sample, the researchers first decide what traits or dimensions are significant. They may want to account for gender, social class, age, religion, education level, or other characteristics. Then, they randomly sample within each chosen category. If the researchers know that 70% of their target population is not college educated, they’ll ensure 70% of their survey participants are also not college educated.
This sampling method can better replicate the demographics of the target population. It’s especially useful when one category is a small minority compared to the others. In simple random sampling, this demographic could be underrepresented or even nonexistent in the sample. Stratified sampling is complex and time consuming. It could be challenging to find participants in the target population that meet the other demographic criteria.
Example:
The company has 800 female employees and 200 male employees. You want to ensure that the sample reflects the gender balance of the company, so you sort the population into two strata based on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people.
Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. This is called multistage sampling.
This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as there could be substantial differences between clusters. It’s difficult to guarantee that the sampled clusters are really representative of the whole population.
Example:
The company has offices in 10 cities across the country (all with roughly the same number of employees in similar roles). You don’t have the capacity to travel to every office to collect your data, so you use random sampling to select 3 offices – these are your clusters.
Nonprobability Sampling Methods
Nonprobability sampling methods do not use any randomization to select survey participants. Therefore, population members do not have an equal chance of being included. Some members may have no chance of being in the sample. Others may have a much higher chance, and they will have disproportionate representation in the sample. Nonprobability sampling has limited applications for quantitative researchers who need quality samples.
Some nonprobability sampling techniques include:
Convenience Sampling
Convenience sampling includes participants based on their availability and accessibility. Essentially, it includes people who are easy to reach. If you’re an academic researcher, it’s easy to sample people from your own institution. It’s even more convenient to survey your own students or classmates.
One way to conduct convenience sampling is to wait in a crowded location and approach people to participate in a survey. When enough people agree to meet your sample size, you stop the survey.
Convenience sampling is beneficial because it lets you collect data quickly. It’s usually inexpensive to sample these participants because you can take advantage of a readily available sample. You also don’t need to follow strict rules to ensure randomization. The downside is that the survey won’t be representative of the target population.
Example:
You are researching opinions about student support services in your university, so after each of your classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to gather data, but as you only surveyed students taking the same classes as you at the same level, the sample is not representative of all the students at your university.
Voluntary response sampling
Similar to a convenience sample, a voluntary response sample is mainly based on ease of access. Instead of the researcher choosing participants and directly contacting them, people volunteer themselves (e.g. by responding to a public online survey).
Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely to volunteer than others.
Example:
You send out the survey to all students at your university and a lot of students decide to complete it. This can certainly give you some insight into the topic, but the people who responded are more likely to be those who have strong opinions about the student support services, so you can’t be sure that their opinions are representative of all students
Snowball Sampling
Snowball sampling relies on the first survey participants to refer you to the next ones, and so on. Once you’ve found enough people to meet your required sample size, you stop the survey.
One advantage of snowball sampling is it allows surveyors to find people who are generally hard to reach. If people in the target population don’t want to be found, like those involved in illegal activity, their contact information is not readily available. The snowball method creates a kind of word-of-mouth marketing that makes it easier to find participants.
The downside is that it’s impossible to know how representative your sample is. These surveys generally create a very homogeneous sample.
Example:
You are researching experiences of homelessness in your city. Since there is no list of all homeless people in the city, probability sampling isn’t possible. You meet one person who agrees to participate in the research, and she puts you in contact with other homeless people that she knows in the area.
Purposive sampling
This type of sampling, also known as judgement sampling, involves the researcher using their expertise to select a sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific phenomenon rather than make statistical inferences, or where the population is very small and specific. An effective purposive sample must have clear criteria and rationale for inclusion.
Example
You want to know more about the opinions and experiences of disabled students at your university, so you purposefully select a number of students with different support needs in order to gather a varied range of data on their experiences with student services.
Quota Sampling
Quota sampling is similar to stratified sampling. The difference is that this method doesn’t randomly select participants. As with stratified sampling, the researchers first define categories they want to represent in their sample and choose appropriate proportions for each group. These could be equal quotas, like 100 men and 100 women, or they could seek to replicate a target population’s demographics.
Instead of randomly selected participants, the surveyors will use some form of convenience sampling. When they’ve hit the right quotas for each category, they stop the survey.
One benefit of quota sampling is that it can represent the target population more accurately than convenience or snowball sampling. This survey method can cover many different characteristics and handle a lot of complexity. While it is the most accurate of the nonprobability techniques, it’s still not as representative as probability methods.
How to Determine What Type of Survey Sampling Is Best for Your Application
Each type of survey sampling has a place in research. Nonprobability sampling, while less accurate, can be a practical way to begin the research process. Probabilistic sampling can provide more accurate results, which can be used to generalize about the target population. Here’s when to use each type of sampling:
Random sampling:
As a highly representative method, random sampling has the widest applications. It’s particularly useful for large populations. It’s not ideal for studying an uncommon characteristic in a large group, as you may not randomly select enough participants with the trait.
Systematic sampling:
If your population is relatively homogenous, you can use the faster and more convenient systematic sampling method.
Stratified sampling:
When you need to represent many individual characteristics from the target population, choose stratified sampling.
Convenience sampling:
Since convenience sampling is fast, it’s suitable for preliminary research. You can get a general idea of a more precise study’s results without the upfront time or cost of a probabilistic sample.
Snowball sampling:
When your target population is homogenous and generally hard to reach, snowball sampling can help. It’s helpful for surveying illicit drug users or those involved in illegal activity.
Quota sampling:
When a more rigid and costlier study would use stratified sampling, preliminary research can instead use quota sampling. Like stratified sampling, it’s used when the researchers can identify specific traits within the target population that they want to study. Market researchers often use this method.
Lesson 2.2 Sample Size and Sample Size Calculator
What is a “Sample Size”?
A sample size is a part of the population chosen for a survey or experiment. For example, you might take a survey of dog owner’s brand preferences. You won’t want to survey all the millions of dog owners in the country (either because it’s too expensive or time consuming), so you take a sample size. That may be several thousand owners. The sample size is a representation of all dog owner’s brand preferences. If you choose your sample wisely, it will be a good representation.
When Error can Creep in
When you only survey a small sample of the population, uncertainty creeps in to your statistics. If you can only survey a certain percentage of the true population, you can never be 100% sure that your statistics are a complete and accurate representation of the population. This uncertainty is called sampling error and is usually measured by a confidence interval. For example, you might state that your results are at a 90% confidence level. That means if you were to repeat your survey over and over, 90% of the time your would get the same results.
How to Find a Sample Size in Statistics
A sample is a percentage of the total population in statistics. You can use the data from a sample to make inferences about a population as a whole. For example, the standard deviation of a sample can be used to approximate the standard deviation of a population. Finding a sample size can be one of the most challenging tasks in statistics and depends upon many factors including the size of your original population.
General Tips
Step 1: Conduct a census if you have a small population. A “small” population will depend on your budget and time constraints. For example, it may take a day to take a census of a student body at a small private university of 1,000 students but you may not have the time to survey 10,000 students at a large state university.
Step 2: Use a sample size from a similar study. Chances are, your type of study has already been undertaken by someone else. You’ll need access to academic databases to search for a study (usually your school or college will have access). A pitfall: you’ll be relying on someone else correctly calculating the sample size. Any errors they have made in their calculations will transfer over to your study.
Step 3: Use a table to find your sample size. If you have a fairly generic study, then there is probably a table for it. For example, if you have a clinical study, you may be able to use a table published in Machin et. al’s Sample Size Tables for Clinical Studies, Third Edition.
Step 4: Use a sample size calculator. Various calculators are available online, some simple, some more complex and specialized.
Step 5: Use a formula. There are many different formulas you can use, depending on what you know (or don’t know) about your population. If you know some parameters about your population (like a known standard deviation), you can use the techniques below. If you don’t know much about your population, use Slovin’s formula..
DIFFERENT SAMPLE SIZE FORMULA
Cochran’s Sample Size Formula
The Cochran formula allows you to calculate an ideal sample size given a desired level of precision, desired confidence level, and the estimated proportion of the attribute present in the population.
Cochran’s formula is considered especially appropriate in situations with large populations. A sample of any given size provides more information about a smaller population than a larger one, so there’s a ‘correction’ through which the number given by Cochran’s formula can be reduced if the whole population is relatively small.
The Cochran formula is:
Where:
e is the desired level of precision (i.e. the margin of error),
p is the (estimated) proportion of the population which has the attribute in question,
q is 1 – p.
z is the z-score
Example:
Suppose we are doing a study on the inhabitants of a large town, and want to find out how many households serve breakfast in the mornings. We don’t have much information on the subject to begin with, so we’re going to assume that half of the families serve breakfast: this gives us maximum variability. So p = 0.5. Now let’s say we want 99% confidence, and at least 5 percent—plus or minus—precision. A 99 % confidence level gives us Z values of 2.58, per the normal tables, so we get:
((2.58)^2 (0.5) (0.5)) / (0.05)^2 = 666
So a random sample of 666 households in our target population should be enough to give us the confidence levels we need.
Modification for the Cochran Formula for Sample Size Calculation In Smaller Populations:
The Modified Cochran formula is:
Here n0 is Cochran’s sample size recommendation, N is the population size, and n is the new, adjusted sample size. In our earlier example, if there were just 1000 households in the target population, we would calculate:
666 / (1 + ( 384 / 1000 )) = 400
So for this smaller population, all we need are 400 households in our sample; a substantially smaller sample size.
Yamane’s Formula [A Simplified Formula For Proportions]
Yamane's Formula provides a simplified formula to calculate sample sizes.
Where n is the sample size, N is the population size, and e is the level of precision
Example:
Suppose our evaluation of farmers’ adoption of the new practice only affected 2,000 farmers. A 95% confidence level and P = .5 a. We get,
n = N/[1+N(e)^2] = 2,000/[1+2000(0.05)^2] = 333 Farmers
Slovin's Formula
If you take a population sample, you must use a formula to figure out what sample size you need to take. Sometimes you know something about a population, which can help you determine a sample size. For example, it’s well known that IQ scores follow a normal distribution pattern. But what about if you know nothing about your population at all? That’s when you can use Slovin’s formula to figure out what sample size you need to take, which is written as
n = Number of samples,
N = Total population and
e = Error tolerance (level).
Example question:
Use Slovin’s formula to find out what sample of a population of 1,000 people you need to take for a survey on their soda preferences.
Step 1: Figure out what you want your confidence level to be. For example, you might want a confidence level of 95 percent (giving you an alpha level of 0.05), or you might need better accuracy at the 98 percent confidence level (alpha level of 0.02).
Step 2. Plug your data into the formula. In this example, we’ll use a 95 percent confidence level with a population size of 1,000.
n = N / (1 + N e2) =
1,000 / (1 + 1000 * 0.05 2) = 285.714286
Step 3: Round your answer to a whole number (because you can’t sample a fraction of a person or thing!)
285.714286 = 286