Random and non-random sampling

Random and Non-Random Sampling

Gathering information about an entire population often costs too much or is virtually impossible. Instead, we use a sample of the population. A sample should have the same characteristics as the population it is representing. Most statisticians use various methods of random sampling in an attempt to achieve this goal. This section will describe a few of the most common methods. There are several different methods of random sampling. In each form of random sampling, each member of a population initially has an equal chance of being selected for the sample. Each method has pros and cons.

The easiest method to describe is called a simple random sample. Any group of n individuals is equally likely to be chosen if the simple random sampling technique is used. In other words, each sample of the same size has an equal chance of being selected. For example, suppose Lisa wants to form a four-person study group (herself and three other people) from her pre-calculus class, which has 31 members not including Lisa. To choose a simple random sample of size three from the other members of her class, Lisa could put all 31 names in a hat, shake the hat, close her eyes, and pick out three names. A more technological way is for Lisa to first list the last names of the members of her class together with a two-digit number, as in the following table.

Lisa can use a table of random numbers (found in many statistics books and mathematical handbooks), a calculator, or a computer to generate random numbers. Then, she could choose students based on if their number matches the random numbers that were generated. The first 3 students that match the random numbers Lisa generates will join her study group

Besides simple random sampling, there are other forms of sampling that involve a chance process for getting the sample. Other well-known random sampling methods are the stratified sample, the cluster sample, and the systematic sample.

To choose a stratified sample, divide the population into groups called strata and then take pre-determined proportion of observations from each stratum. For example, you may want to make sure your sample of students has the same proportion of majors (e.g. 20% English majors, 10% math majors, etc.) as your population. You could stratify (group) your college population by major and then use simple random sampling to choose the desired proportion of college students from each stratum (each major). The key to stratified sampling is splitting your data into groups before you use simple random sampling so that you can end up with the desired proportions of the groups represented in your sample

.

To choose a cluster sample, divide the population into clusters (groups) and then use simple random sampling to select some of the clusters. If you select a cluster, all of the members from that cluster are included in your sample.. For example, you may choose to cluster your population by student major. Student majors are the cluster. You then number each student major so that you can use simple random sampling to choose which majors will be in your sample. The first major you choose may be English. This means all of the English majors in the population are part of your sample. Divide your college faculty by department. The departments are the clusters. This process is repeated until the number of desired clusters is reached.

To choose a systematic sample, randomly select a starting point and take every nth piece of data from a listing of the population. For example, suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You must choose 400 names for the sample. Number the population 1–20,000 and then use simple random sampling to choose one number from 1-20,000. This number will represent one name in the phone book and this name will be your starting point. Then take the total number of names in the population and divide it by the desired sample size. In this case, that is 20,000 divided by 400. That equals 50. This means you start at your starting point, include the starting point in your sample, and then move 50 names down the line and include that person in your sample. Continue until you reach your 400 person sample.

A type of sampling that is non-random is convenience sampling. Convenience sampling involves using results that are readily available. For example, a computer software store conducts a marketing study by interviewing potential customers who happen to be in the store browsing through the available software. The results of convenience sampling are likely to be highly biased (favor certain outcomes).

Sampling data should be done very carefully. Collecting data carelessly can have devastating results. In statistics, a sampling bias is created when a sample is collected from a population and some members of the population are not as likely to be chosen as others (remember, each member of the population should have an equally likely chance of being chosen). When a sampling bias happens, there can be incorrect conclusions drawn about the population that is being studied. Samples may be biased due to sampling errors and non-sampling errors . Sampling errors can be broken down into two components: random sampling errors and non-random sampling errors . Random sampling errors occur when a random sample is not representative of a population based on chance factors alone. Researchers can do nothing about random sampling errors. Non-random sampling errors arise from improper sampling, and have nothing to do with chance. For example, a non-random sampling error will arise if you use the convenience method as your sampling method. This is something researchers can control, so we should never have non-random sampling errors. Finally, non-sampling errors are errors that are not related to the act of selecting a sample. This includes missing data or entering in data wrong into a spreadsheet. Researchers should try to limit these types of errors as well. The actual process of sampling causes sampling errors. For example, the sample may not be large enough. Factors not related to the sampling process cause nonsampling errors. A defective counting device can cause a nonsampling error. In reality, a sample will never be exactly representative of the population so there will always be some sampling error. As a rule, the larger the sample, the smaller the sampling error.


References

  1. https://courses.lumenlearning.com/introstats1/chapter/sampling-and-data/

Licenses and Attributions

CC licensed content, Shared previously

All rights reserved content