Parameter -- an attribute that refers to the ENTIRE population
Statistic -- a value that refers to a subset or a sample of a population
Proportion -- a percentage; used for non-numerical data (e.g. polls that ask for a preference in something, or a yes-no question)
Mean -- the average
x̅ (“x-bar”) -- mean symbol for a statistic
μ (“mu") -- mean symbol for a parameter
p̂ (“p-hat”) -- proportion symbol for a statistic
P - proportion symbol for a parameter
Random Number Generator (RNG) -- generates random numbers from a computer
Sample - take a small proportion of a population
Census - take everything from a population (usually ineffective and time-consuming)
RNGS DON'T GENERATE PEOPLE! ONLY NUMBERS!
When describing the process of picking people randomly from a group, DON'T say that the RNG will pick n people!. Say "The RNG will randomly generate numbers from 1 to n, with no duplicate numbers. The people whose numbers are generated will be selected to do ..."
Simple Random Sample (SRS) - Randomly picks N objects from a population; ONLY sample where all combinations of N objects have an equal chance of being selected
Stratified Random Sample - Divide the population into homogeneous subgroups (meaning that everything in a subgroup has same characteristics) and take a SRS from each subgroup
Cluster Sampling - The population is ALREADY divided up into clusters (e.g. houses on the same block, classes, etc), take a SRS of all the clusters; from the chosen clusters, take CENSUS of the cluster (so survey everything inside of that cluster)
Systematic Sampling - Every Nth object is selected (e.g. every 10th person in a grocery line is surveyed)
Convenience Sample - typically not a good sampling strategy; exposed to bias; means that survey objects that are readily available
Nonresponse bias - some people refuse to participate in the survey
Response bias -- Something in a survey (e.g. wording) is designed to lead the subjects to an answer
Voluntary Response Bias -- People with typically strong opinions will respond to a survey (such as radio shows where people will call in--usually people who don't care won't respond
First, assign all the potential subjects from the population a number between 1 to N (where N represents the population size) with no repeating numbers. We want to select only x units from the population. Using a RNG, generate x numbers within the range of 1 to N (no repeating numbers allowed). The units whose numbers were generated get to be selected to participate in the survey.
Why? Let's say that the National Basketball Association (NBA) wants to get a survey of players' approval of a new rule. We'll say that this rule somehow benefits teams in large cities. If the NBA conducted a simple random sample of all the players, there is a chance that all the players would be from large cities--over-exaggerating the number of players who support the new rule. The NBA recognizes this and decided to do a stratified sample. The NBA would split the players into two strata (stratum (strata is plural) is another word for a homogeneous group), one stratum for large city players and another stratum for small city players. It would then take an equal percentage from both strata (let's say 10%). Now the sample would be much more representative as no subgroup is being oversampled or underrepresented.