3. Experiments and Observational Studies

Learning objectives (and summaries)

Design experiments that isolate one explanatory variable and can be used to determine causation.
  • Understand how an experiment is used to determine causation.
    • Isolate one variable to test at a time.  Each option/level of the variable is a treatment.
    • Randomize subjects so that all treatment groups are identical (besides the treatment).
    • Compare the results of two or more groups to see what difference were caused by different groups.
  • Understand how an observational study differs from an experiment in its design and results.
    • It compares two variables.  However, unlike an experiment, it cannot be controlled by the researcher.  This means that the results are susceptible to lurking variables and CANNOT be used to determine causation.
  • Understand the vocabulary of experiments
    • An experiment tests explanatory variable(s) called factors, sets of factors are grouped into treatments, and each treatment is applied to one group of individuals/subjects.
  • Recognize the placebo effect and control for it using a blind or double-blind design.
    • If only an outside observer is aware of which treatment the patient is receiving, neither the patient nor the physician would be able to bias the results based on treatment group.
  • Distinguish between a completely random design and a block design, including matched pairs.  Decide when a block design should be used.
    • Random: all treatment groups are formed by choosing SRS of volunteers to populate each group.
    • A block design requires first getting all individuals into blocks based on similarity.  The number of total treatments is the same of the number of individuals per block.  Second, individuals in each block are randomly distributed among each of the treatment groups.
    • A matched pairs design is similar to blocks, but there is a special relationship between the individuals in each pair (same person at different times, same person's different arms, twins, etc.).  The results are also analyzed differently.
  • Draw a diagram to explain how subjects are assigned to treatments, and then compared, in an experiment.
    • Start with all volunteers, [show blocking step when it applies], branch to each treatment group, and then converge to compare.
    • Test (15pts): 13 questions (8 MC, 3 short response, 2 written/drawn) and one of these free response questions about the following scenario (2pts):
      • In order to test people's willingness to obey an authority figure against their conscience, the psychologist Stanley Milgram used scientists in white lab coats to ask subjects to provide an electric shock to a "student" who incorrectly answered questions on the other side of a curtain.  The subjects were told that the shock would help the student learn better and motivate correct answers.  In reality, the person behind the curtain was not actually being shocked, but was just an actor who was helping with the experiment.  The result of the study was that most people would apply the highest level shock, despite their reservations, because the scientist in the lab coat told them it was necessary.
      • Additional information of the scenario: YouTube, Wikipedia
        • Question 1: Discuss the benefits of this research.  Justify why the experiment had to be conducted the way it was.
        • Question 2: Discuss the ethical problems with this kind of research.
        • Question 3: Explain what an IRB is and why they exist today for human-subject research.


            Not all hypotheses can be tested with a proper experiment.  Explain why each of the following would be very difficult / impossible to test.

            1. Jason wants to determine how gender is related to political party preference in his town.
            2. The military wants to study the long term effects of a new biological weapon on humans.
            3. A psychology group wants to study the effect of making poor people millionaires
            4. The math department wants to study the differences in academic performance of students learning a course primarily through lecture and students learning primarily with reverse-classroom.
            5. A student wants to understand how complete social isolation affects people for a year.

            For the problems below, you need to prove each of the statements with an experiment.  Answer a-j for each:

            • a)  What/who are the experimental units/subjects? 
            • b) What is the explanatory variable? 
            • c) What should the treatments be (the options for the explanatory variable)? Is a placebo / control group needed? 
            • d) What is the response variable? Is it quantitative or categorical? 
            • e) Without proper experimental design, what lurking variables could be confounded with the explanatory variable? 
            • f) Explain how you would do a randomized comparative experiment with this situation. Is blinding necessary? 
            • g) Could you use a matched pairs design with this study? If so, what would the matched pairs be? How would you decide when each treatment is received / who receives each treatment from the pair? 
            • h) Besides what you used for matched pairs, is there a logical variable to block by? How many experimental units would fall in each block? 
            • i) Can you ethically ask volunteers to participate in this experiment? Is there any information you would need to hide from the volunteers? 
            • j) If you have human subjects, is it likely that your subjects could follow the directions of the study and carry it out properly to its completion?
            6. Premium gasoline (89 octane) gives cars better gas mileage than regular gasoline (87 octane).

            7. A server who suggests the most popular appetizer to customers at Restaurant A will make more appetizer sales than a server who asks “can I start you out with an appetizer?” 

            8. A new hip replacement procedure, when compared to the existing common procedure and to no procedure at all, will lead to more natural walking (as rated 1 to 10 by physical therapists) 2 months after surgery. 

            9. Taking a recently developed pill each day will reduce the number of headaches experienced over the next 3 months. 

            10. A person who smokes is more likely to get lung cancer than someone who does not smoke. (Hint: if your experiment is nearly impossible to do well with humans, use rats). 

            11. A 2” diameter tree (measured at the trunk) has a greater chance of surviving a transplant (being replanted in a new location) if you water every day for 20 minutes than if you water every other day for 40 minutes.

            Block the subjects in the experiment below by their math grades.

            12.    An education study wants to compare student performance in a problem-set-only course and a course infused with multiple large projects. The students in the table are available for the study:

             Name Last math grade
             Bill 67
             Mary 74
             Suzy 83
             James 92
             Josh 89
             T.J. 93
             Missy 78
             Kirsten 98

            >> 13-16 on attached PDF below

            Below you are given an experiment or observational study.  Answer the following for each:

            • a) Is this an observational study or an experiment? How can you tell? 
            • b) If it is an experiment, explain how randomization, replication, and a control are used. 
            • c) What population do the individuals/experimental units/subjects come from? 
            • d) The explanatory variable is the type of treatment. What are the treatments? How many are there? 
            • e) What is the response variable? Is it quantitative or categorical? 
            • f) In this study, imagine that your test had statistical significance. What conclusions CAN you make? List a few things you CANNOT conclude (based on whether you have an observational study or experiment, what population the subjects come from, and if randomization was used properly).

            17. A company wants to know how much cold temperatures affect the elasticity of their rubber bands. 100 random bands from a box of 200 are placed in a freezer while the remainder are kept at room temperature. The amount of stretch before breakage is measured on each rubber band, and the average stretch distances are compared.

            18. A dog trainer wanted to improve puppy school to be more effective. She randomly assigned 4 of her 9 Petsmart classes to teach one method of “loose leash” walking while the remaining 5 classes learned a different method.

            19. A set of volunteers is posed the following thought experiment: if there were 5 people on a railroad track about to get hit by a train, but at the last second you could pull a lever and divert the train to another track where 1 person was standing, would you do it? They were then asked a second question: if there was a single track with 5 people about to get hit by a train, and you could push a man off the bridge overhead at the last second to bring the train to a halt, would you do that? The results of the two responses were later compared.

            20. A pharmaceutical company wants to compare two varieties of a drug they have developed to alleviate allergies. The study director first described the two varieties and their differences to the pool of volunteers. Then each person decided which drug better fit their symptoms in order to find the most significant effect. Finally, the effectiveness of the two drugs (based on number of allergic reactions of the volunteers) was compared to decide which one to send to market.

            21. Milgrim’s famous obedience experiment had a volunteer designated as the “teacher” and a lab helper as the “learner” (though the volunteer was told that the “learner” was just another volunteer in the study). The teacher would ask the learner a question, and if it was wrong, the experimenter ordered the “teacher” to give an increasingly more painful electric shock (or so they thought – since the “learner” was behind a wall, they were only faking the pain of the shock, but the “teacher” did not know this). The experimenter recorded how far people would go on the shock scale (from 30V to 450V). The results were shocking (pun slightly intended): 65% of the subjects administered the highest level shock of 450V despite cries of pain from behind the wall.

            22. Birds eat fertilizer out of farmer’s fields, costing the farmer money and sometimes killing birds. A chemical company is designing a new coating for its fertilizer that will be less attractive to birds. They developed 4 varieties – red, purple, green with thin white stripes, and orange with thin purple stripes. A set of recently captured birds will first be held overnight with no food. Then, they will be randomly selected, one at a time, to go in a cage with a specific amount of food in one of the fertilizer colors. The observer will track how many granules of food are consumed by the bird. Finally, they will average the number of granules of each color eaten and compare to determine which is eaten the least.

            Practice solutions
            1. Gender cannot be randomly assigned -- the best you can do is an observational study of gender, but not a true experiment.
            2. It is not ethical to have people, even volunteers, be used to test a biological weapon.
            3. If low-income people were randomly assigned to be millionaires, that would cost a lot of money.  If you just looked for lottery winners, that would not be random assignment according to the setup of the study.  If you wanted to do a study of low-income people who buy lottery tickets (as opposed to all low-income people), then you might be able to consider the winners as randomly assigned and carry out an experiment.
            4. Because of scheduling conflicts, it is not truly random who ends up in each period.  Also, since students would know which method they are using, they may be biased and sabotage the experiment.  Data can still be gathered on the different methods, but it is not ideal experiment conditions.
            5. It is not ethical to socially isolate someone for a year to satisfy your curiosity.
            6. Gasoline
            • a)  cars
            • b) Type of gasoline
            • c) (1) Regular 87 octane gas, (2) premium 89 octane gas [the regular gas would be considered the control -- no other control group needed]
            • d) Gas mileage (quantitative)
            • e) Type of car, type of driver, road conditions, length of commute, others
            • f) Randomly assign half of the volunteers to use regular and the other half to use premium gas.  Don't tell the drivers which type of gas is being used in their vehicle so it doesn't affect how they drive (blinding).
            • g) Yes -- the same car could be tested first with one type of gas, then the other, and compared.  You would randomly decide which type of gas is used first.
            • h) You could form blocks by type of car.  There would be 2 similar cars per block (each block would have one using regular and one using premium gas).
            • i) No ethical problems -- the only thing to hide is what type of gas they get.
            • j) The only rules would be to not fill up with more gas or siphon gas out of the car until the measurement is complete.  This could be an issue if someone needs to get to work and is low on gas before the study is done.
            7. Appetizers
            • a) Customers at the restaurant
            • b) What the server says
            • c) (1) suggesting a specific appetizer, (2) saying "can I start you out with an appetizer?" [the second option would be considered the control -- no other control group needed]
            • d) Depends on how you interpret the question.  If you say "will make more customers buy appetizers", then it is categorical (the customer either buys or doesn't).  If you say "will increase the average number of appetizers per person", then it is quantitative. 
            • e) Which server, type of customer, time of day, family size, others
            • f) The manager could randomly assign tables to receive different treatments.  Since the customer wouldn't need to know they are part of this study, the customer is blinded.  It would be nice to blind the servers too, but since they need to be the ones who ask the question, that is impossible.
            • g) Probably not -- you can only ask the same customer once if they want an appetizer.  It would be hard to know when the same customer comes back to be tested with the other treatment.
            • h) You could block by time of day, family size, or server (or by all of these).  Either way, there should be only 2 subjects (tables) per block -- one for each treatment.
            • i) No ethical problems -- there is nothing wrong about running this experiment and not telling the customer.
            • j) Yes -- the subjects just have to order an appetizer or not.  The harder part would be to make sure the servers ask the right questions.
            8. Hip replacement
            • a) People who are receiving a hip replacement
            • b) What type of hip procedure you get
            • c) (1) new hip replacement procedure, (2) existing common procedure, (3) no procedure [the baseline control group]
            • d) How naturally the subjects walk (as rated 1 to 10 by physical therapists) 2 months after surgery -- quantitative
            • e) Exercise / therapy (the no procedure group would likely do no extra work if they knew they were a control group)
            • f) Study director would randomly assign patients to the treatments.  The patient should be blinded -- this means that they would need a fake surgery for the control group.  The doctor should also be blinded until they need to actually perform any procedure.
            • g) If the patient had issues with both hips, you could compare two of the treatments in the same person.  This might not be such a great idea though and might confound two treatments when the therapist judges the walking.
            • h) If you blocked by prior walking ability, then each treatment group would be more likely to be about equal.
            • i) Yes -- you would just be up front about what the different possible treatments are, and as long as they understand that they might get the sham surgery with nothing done, it is very ethical.
            • j) Patients might not do their recommended exercise each day as part of the treatment process.
            9. Headaches
            • a) People who take the pill
            • b) If you take the pill or not
            • c) (1) taking the new pill, (2) taking a placebo [this group is added as a control for comparison to the new pill]
            • d) Number of headaches experienced over the next 3 months -- quantitative
            • e) If you are taking the real pill, and know it, you might assume that headaches will go away, causing them to go away.  If anything can be influenced by the placebo effect ("its all in your head!"), it would be literal pain in your head.
            • f) Half of the volunteers in the study would get randomly assigned to the real pill and the other half get the placebo.  For the placebo to be effective, the patient needs to be blinded.  The person talking to the patient should also be blinded (so, double blind) to reduce that as a confounding factor.
            • g) You could do the study at different times for every patient -- 3 months of one, a break, and 3 months on the other.  If the patient can sense the difference between drugs, this could be a problem.
            • h) You could block subjects by number of headaches they normally experience so that both high and low headache folks are in each treatment group
            • i) Yes -- it is a very standard clinical trial comparing a new drug vs. and placebo, as long as you are up front about what people could get.
            • j) Maybe -- some subjects might forget to take the pill for a day or two, messing up the results.
            10. Smoking
            • a) You can't use people.  If you did, you would need to randomly assign half of a group of non-smokers to start smoking for the rest of their lives without quitting in order to gather data for the study.  People would never want to do this, and it isn't too ethical given what we know about smoking already.  However, you could use rats and force half of them to start smoking (or breathing in something equivalent).  Though this is not ideal, it could at least establish causation within a living creature.  Many experiments that are tough to do on humans for cost or ethical reasons are done first on animals.
            • b) If the subject smokes
            • c) (1) smoke, (2) does not smoke [this is the control/comparison group]
            • d) If you get lung cancer or not during your life -- categorical
            • e) If not randomly assigned to treatments, people who smoke might be more likely to live a different lifestyle with other risky behaviors (drinking, driving a motorcycle, etc).
            • f) Half of the rats would be assigned to "smoke" and the other half would not and their lung cancer risk would be compared.
            • g) Identical rat twins?  Probably not.
            • h) Perhaps size of rat or general health of rat.
            • i) Unless you are not into experimentation on animals it should be okay.
            • j) n/a
            11. Trees
            • Let's be honest -- you didn't want to do #11 either.
            12. Blocking
            Since the study is comparing 2 treatments, students should be placed in blocks of 2.  The 2 highest scores, then the next 2 highest, and so forth should be in a block.  So:
            [Kirsten, T.J.], [James, Josh], [Suzy, Missy], [Mary, Bill]
            The reason you might want to do this is because it makes it less likely that one group starts with all the stronger students -- it minimizes lurking variables (confounding factors).

            13-16: see attached PDF file

            17. Rubber bands
            • a) Experiment -- there are 2 treatments and the experimental units (rubber bands) are randomly assigned to them
            • b) Control: one set of bands is kept at regular temperature but everything else is identical in the treatments
                  Randomization: randomly half of the bands are in the freezer group and the other half are in the control
                  Repetition: there are 200 bands, plenty to get good randomization
            • c) **What population do the individuals/experimental units/subjects come from?** (sorry about the typo): the company's rubber bands
            • d) Freezer and a standard room for some period of time -- there are 2 treatments
            • e) The amount the rubber band stretches before breaking -- quantitative
            • f) Since this is a valid experiment, you can claim causation (you can claim that the thing you studied was caused by your treatment).  Since the population under study was only one company's rubber bands, you are limited to claiming things only about this company's rubber bands and not all rubber bands.  Thus if the freezer bands broke statistically significantly shorter than the room temperature bands, you could claim that this company's rubber bands break sooner at cold temperatures.
            18. Puppy school
            • a) Experiment -- there are 2 treatments and the experimental units (a class) are randomly assigned to them.
            • b) Control: it does not specify which method is the current one, but this would be the control.  Also note that the 2 treatments are identical except for the training method.
                  Randomization: half of the classes are randomly assigned to each method
                  Repetition: each class has a few people/dogs (let's say 5-6) and there are 9 classes
            • c) People who take their puppy to Petsmart for training
            • d) The different methods of loose-leash walking -- there are 2
            • e) It does not specify, so you will need to create one.  You could look at the distance a dog walks with a loose leash or the amount of time the dog walks with a loose leash -- both of these would be quantitative.
            • f) This is a valid experiment on pet owners that go to Petsmart and use this trainer.  If one method was better than the other, you could claim that, with this trainer, one method works better than the other to train dogs to loose-leash walk.  This claim is useful to the trainer that ran the study: s/he now knows which method to use.  If s/he wanted to make a broader claim, other trainers at other locations would need to replicate the experiment and get similar results.
            19. The moral train
            • a) Experiment...though not well run.  There are 2 treatments (questions) given to each subject in a matched pairs design.  There is no randomization with the order of the treatments.
            • b) Control: if you are not familiar with the experiment, either could be considered a control to the other.  Given the intent, the "pulling the lever" treatment is probably the control.
                  Randomization: the order that subjects answer the questions should be randomized, but right now it is not
                  Repetition: the size of the set of volunteers would determine this 
            • c) It does not specify where the volunteers come from.  It is typical to assume that many psychology studies use college students as subjects, so perhaps the volunteers do not represent all generations equally.
            • d) There are 2 treatments -- the 2 questions: "would you pull the lever?" and "would you push the man off the bridge?"
            • e) If the subject would do it (would say yes) -- this is categorical/binomial
            • f) You could conclude that more people would say yes to one question than the other.  Since it is an experiment, you can claim causation -- the difference between the lever question and the pushing question is what caused people to change their answer.
            20. Pharm. company:
            • a) Observational study -- there are two treatments, but patients chose which one they want, so the best you can do is to observe how they each do.  There are multiple variables now being confounded, making it impossible to determine causation
            • b) Control: either drug acts as a control / comparison to the other -- no clear control
                  Randomization: none -- this is the problem!
                  Repetition: only in the single setting with the unknown number of volunteers.
            • c) Unknown.
            • d) The 2 varieties of the drug
            • e) number of allergic reactions in a given time period
            • f) Since this is only an observational study, it will be harder to claim what is the cause.  Assuming the population the volunteers came from is all residents of the Rochester area, then the study conclusions would show that, when patients choose the best fitting medicine, the group using one of the medications improved more than the other.
            21-22 will be discussed in class.

                AP does not use confounding / lurking terms, instead DESCRIBE the issue -- ASSIGNING treatment is the difference between experiment and observational study
                replication is not repeating one individual, it is using many different individuals from the population

                Nov 24, 2013, 10:00 PM