2. Sampling

Task 1: School survey
      As a class, we will design a survey.  Then, we will choose a random method to deliver it to the selected sample at school.  Each person will be responsible for finding 2-3 people.  You may not give your survey by walking into classes (if you need to talk to a student, find them during their lunch period or in the hallway).
      • [1 pt] Contribute one question to the possible questions for the survey.
      • [1 pt] Find your assigned people and have them take the survey.
      Task 2: Infographic survey
          In teams, you need to design and conduct a survey.  You will later use the results of this survey to design an infographic.
          • [1 pt] Choose a target population that you want to study outside of the school, such as Byron coffee drinkers, Rochester Starbucks customers, BES 4th graders, etc.
          • [4 pts] Choose one of the methods below to take a random sample of the population and properly execute it.  Your sample must include at least 30 people.
          • [2 pts] Create a survey with AT LEAST 7 questions.  Questions should relate to a common theme and not bias the responder one way.  You want enough questions to fill your poster with room to eliminate a question later, but be careful not to make it too long.  Each question must be true/false or multiple choice (where the person must select only one option).  Don't forget to offer a "none of these" option when applicable.  Get your questions teacher-reviewed before going out.
          • [3 pts] Write a sentence that summarizes the results of each question on your survey.  Each sentence should include the population you studied, the question you asked, and the resulting sample proportion (including the correct symbol).  For example: We estimate that 58% of Byron voters supported the primary school referendum.
          Mastery Quiz Prep
              The first video looks at populations vs. samples and parameters vs. statistics.  It introduces the idea that not all random samples are exact estimates of the population, yet this is not a bad thing.

              Review of concepts discussed in class

              SRS


              Do all people have an equal, random chance of being selected?  Answer yes or no:
              • 1. You walk through a large crowd and interview the first person who makes eye contact with you.
              • 2. You write 100 names on slips of paper, put them in a hat, and draw out 4 of them.
              • 3. You number people 1-24 and then roll 4 dice and add them together to see which person to choose.
              • 4. You number people 1-35 and use a random number generator to create a number between 1 and 35.
              • 5. You break the population into 3 strata: in K-12 school, in tech post-secondary school, and not in school.  You randomly select 40 people from each category.

              Stratified Random Sampling


              A school athletic director wants to know how student athletes feel about the programs the school offers and the coaching it provides.  He wants to be sure to hear balanced perspectives from all of the fall teams.  Imagine there are 67 boys football players, 26 girls volleyball players, 22 cross country runners, 19 girls soccer players, and 17 boys soccer players.
              • 6. What type of sampling ensures that each group is appropriately represented?
              • 7. To produce a sample of roughly 20 students, how many should be sampled from each group?
              • 8. What is the stats name for a group like this?
              • 9. How could you decide specifically who to sample in each group?

              Systematic Sampling


              A school of about 600 students wants to systematically sample 12 students as they enter the building.
              • 10. Every __th person gets sampled (fill in the blank)
              • 11. When sampling like this, you need to generate a random number.  Why?
              • 12. Let’s say you generated the number 34.  List which 12 people you will need ask to be in your sample as they come through the door.

              Cluster Sampling


              Know your sampling technique: list the method used in each scenario.
              • 13. The radio station reads off a number for you to call in and give your opinion.
              • 14. The lottery machine generates 6 numbers between 1-30 to determine the winner.
              • 15. A national firm breaks the country intro groups by race and gender and choose a few people from each group.
              • 16. Every 10th student is selected when they exit the mall.
              • 17. Names are drawn from a hat.
              • 18. You give every person a number and generate a random integer on your calculator.
              • 19. You break the county into groups by their mail carrier and make everyone in each of 3 randomly selected groups your sample.

              The last video walks through sampling-gone-wrong.  A sample that is highly biased cannot be trusted or useful.

              Error and Bias


              You want to study the favorite juices of Olmsted County residents.  To do this, you go through the phone book and randomly select 70 people to call.  Of those you call, 32 answer and answer your questions.
              • 20. What is the population?  What symbol represents the parameter (population proportion)?
              • 21. Is there any undercoverage?  If yes, who?
              • 22. Is there any non-response?  If yes, what is the non-response rate?
              • 23. What is the sample?  What symbol represents the statistic (sample proportion)?
              Bias is everywhere.  Find it below.
              • 24. Manny wants to study the buying patterns of Walmart shoppers.  To do so, he sets up a station from 1-4pm in the front of a Walmart location and interviews every 30th person entering the store to ask them what they intend to buy.
              • 25. A non-profit organization wants to get a picture of charity donation habits of the state.  They use random digit dialing to computer-generate random phone numbers in the state’s area codes (this is a method used by many pollsters to avoid the unlisted number/cell phone problem).  43% of the 300 people called answered and responded to the question.  The question asked people how much they gave to charity during the last tax year and how many different organizations they donated at least $40 to during that time.
              • 26. Fox news created an online poll to sample what proportion of Americans support ongoing military intervention in Afghanistan.  10,546 people answered the survey before it closed.
              • 27. Many fast food restaurants want feedback from their customers, so they offer a free cookie/Whopper/etc. on the back of their receipts if you call in and take a short phone survey.

              Free Response Prep
                  Describe at least four distinct types of error/bias.  Use both statistics vocabulary and "grandma-friendly" language for each type.

                  Use the graphic in the last video above and imagine walking through the survey design and delivery process.  You choose a population, choose a method to sample with, try to reach those chosen, ask a question, and receive an answer.  Each step has challenges that could lead to bias.


                  An SRS is the gold standard of sampling.  Yet, we show over and over that sample statistics are not equal to the population parameters.  Why aren't statisticians bothered by this?

                  Review the second video on the SRS.  Consider the difference between a "biased" and "unbiased" estimate of the population -- what is the difference?  Why does the difference between error from bias and error from randomness matter?  What can you do after taking a SRS?


                  Imagine that a research team called 2000 students to conduct a survey.  However, they were concerned about bias due to the high non-response rate.  To address this, they called another 2000 students to raise the size the sample above its current value.  Decide if this will help the problem of non-response bias and explain why or why not.

                  Consider the challenges with the study the way it ran the first time.  How will the second run be different?  If it was improved, would that matter?  Why or why not?


                  Practice solutions
                      1. No -- some people don't make eye contact very often, and you may not have picked a truly random place to enter the crowd.
                      2. Yes (assuming all the pieces of paper are the same size, of course)
                      3. No -- person 1, 2, and 3 will never be chosen (1 + 1 + 1 + 1 gives a lowest number of 4) and some of the middle values are much more likely than the end values.
                      4. Yes
                      5. No, because there are not equal numbers of people in each category/strata, so people in a less common strata would have a higher chance of being selected.
                      6. Stratified random sample
                      7. 151 total players
                        Football: (67/151) * 20 = 8.87 = 9
                        Volleyball: (26/151) * 20 = 3.44 = 3
                        Cross Country: (22/151) * 20 = 2.91 = 3
                        G Soccer: (19/151) * 20 = 2.52 = 3
                        B Soccer: (17/151) * 20 = 2.25 = 2
                        Note that because of rounding with such a small sample, boys soccer and cross country get a little bit under-represented and the others are a bit over-represented.
                      8. strata!
                      9. Do an SRS of each team: number from 1 to ___ and randomly generate numbers
                      10. 600/12 = 50.  Every 50th person.
                      11. It's your random starting point -- with systematic samples, you can't just start at person #1 because that would be biased towards early people. Use a random number generator to generate your first number, start with that person, and then go every tenth person from there. 
                      12. Since there are ~600 students and you want a sample of 12, do 600/12 = 50, so skip to every 50th person
                        Select person #34, 84, 134, 184, 234, 284, 334, 384, 434, 484, 534, and 584 as they come through the door.
                      13. Voluntary
                      14. SRS -- it will randomly generate a number
                      15. Stratified random sample -- because you sample a few people from EVERY group
                      16. Systematic sample
                      17. SRS
                      18. SRS
                      19. Cluster -- because you first pick the groups, then ask everyone in those groups
                      20. The people of Olmsted Country, population proportion: p
                      21. Yes -- anyone not in the phone book
                      22. Yes -- 38 people did NOT answer, so 38/70 = 54% non-response
                      23. The 32 people who answered the phone (NOT all of the people that were called!), sample proportion: p-hat
                      24. People may not buy what they intended to
                        People may lie about what they want to buy
                        You may get a biased group of people -- only those who are not working / have unusual hours so that they could shop 1-4pm
                        Many people may refuse to participate in the face to face study
                      25. Some people (like me!) have out-of-state area codes but live here
                        Charity giving is seen as good by society, so some people may lie about their giving
                        People might forget about how much they gave.  They may tend to think they gave more than they did.
                        Many people never responded to the survey -- about half of the people called
                      26. The survey was voluntary and only announced to people who watch Fox News.  There is a good chance that the group of  people who both knew about the survey and took the time to go online and vote have political views that don't represent the entire country.
                      27. It is a voluntary survey and the most likely people to take it are between 16-25 (people who buy their own fast food but can't afford / would rather not pay for that extra cookie).  Pro tip: if you want to just get the free cookie and answer the fewest number of questions possible, just press "5" for every question (out of 5) so it doesn't ask follow-up questions.  Kids that want free cookies don't care much about giving quality feedback to the Subway corporate office.

                      Vocabulary
                          population- who or what is being studied (the entire group)
                          parameter- when a summary number (the proportion / mean / standard deviation) is describing the entire population, it is called a parameter
                          sample- a small portion from a population
                          statistic- when a summary number (the proportion / mean / standard deviation) is describing only the sample data, it is called a statistic
                          census- a survey given to all of a population 
                          simple random sample (SRS) - a random sample, but gives everyone an equal opportunity to be picked
                          strata- a “group” of a population, can be divided because of different characteristics. Each individual must belong to a single strata in a given population (ex: grade, gender, or hair color) 
                          stratified sample- a random sampling method where you first break the population into strata (groups), then take an SRS from each group.  You should sample from each group in proportion to its size in the population.
                          systematic sample- a random sampling method where you first you estimate the population size, decide how many people you want to sample, pick a random starting point, and ten ask every nth person to be part of the sample.
                          cluster sampling
                          - a random sampling method where you choose an entire group (or a few) to all be in your sample.  If the groups are not randomly created ahead of time (ex: classroom), selecting a cluster will yield a very non-representative sample.

                          non-response - when individuals do not respond to a survey
                          undercoverage- when there is no chance for the person to be surveyed, for example the person was gone the day of the survey 

                          Notes

                          Ċ
                          Andy Pethan,
                          Sep 7, 2014, 6:54 PM
                          Ċ
                          Andy Pethan,
                          Sep 7, 2014, 6:54 PM
                          Ċ
                          Andy Pethan,
                          Sep 7, 2014, 6:54 PM