Weakness to Student-Level Randomized Controlled Experiments

While the ASSISTmentsTestbed was set up to help you do student level randomization, we will point out some potential weaknesses of student level randomization. Most of them will decrease the chance you might detect an effect but some might inflate your effects. Since no study is perfect, we think that is acceptable and we want to point out the downsides of this work. While this TestBed is based on student-level randomization in authentic situations, there are some limitations.

A final threat to internal validity is novelty effects or Hawthorne Effects. A novelty effect is any new or different condition that improves learning because students are paying attention to it just because it is new or different. A condition that you submit might beat our BestSoFar condition that won't generalize to many other problem sets. Novelty effects inflate the effect size. Ultimately we will be able to detect novelty effects when your idea fails when applied over and over again loosing its novelty.

1. Lack of context

The lack of context will prevent you from knowing if the teacher has done something that will nullify the benefits of a treatment. It could be that the students in the control all got terribly confused and kept asking the teacher for help. In this case they would have had extra support on the topic. While not understanding the context is a potential weakness, it will mostly dilute the effects of the treatment, not inflate them. While all students will be practicing the same content, the form of adaptive feedback will depend on the experimental condition they are in. With this type of study you will have no idea what the exact context will be for the students whose anonymous data they are looking at. This lack of context may prevent you from knowing if the teacher had done something to nullify the benefits of a treatment.

2. Contamination

A second more troublesome issue is contamination effects. In their review paper, McMillan et al., (2007 . http://pareonline.net/getvn.asp?v=12&n=15 ) say “An important principle of good randomized studies is that the intervention and control groups are completely independent, without any effect on each other. This condition is often problematic in field research.” Our field research has this potential problem of a student in one condition showing it to the student who is in the other condition. Once again most of the time this will dilute effects not inflate them. One solution would be to add in a self report question, like “Collaboration is good thing in learning. Did you collaborate with anyone else on this assignment?” In Kelly et al., (2013) we found that some students in the control conditions (that represented a business as usual condition), self-reported that they texted to their friends to asking for help, so we realized that this diluted the effect size we estimated. In this case the effect size was so large we still found reliable differences.

As McMillan and colleagues suggest, “When control subjects realize that intervention subjects have received something ‘special’ they may react by initiating behavior to obtain the same outcomes as the intervention group (compensatory rivalry) or may be resentful and be less motivated (resentful demoralization).” Compensatory rivalry will dilute effect but resentful demoralization will possibly inflate effects and is the most serious threat our design has. To deal with this serious threat to internal validity might survey students and teachers to see if they knew there were in two different groups.

3. Internal validity

Differential attrition is another threat. Since the posttest-section is always at the end you may find that the difference in posttest results are due to students in the different condition completing the posttest measures at different rates. It turns out that differential attribution will be a threat that can fully control and use in a positive way. If one condition causes students to not complete their work and fail to do the problems on the posttest-section, that, in and of itself, is a useful dependent measure.

Since the posttest-section is always at the end you may find that the difference in posttest results are due to students in the different condition completing the posttest measures at a different rates. It turns out that differential attribution will be a threat that you think you can fully control and in fact use in a positive way. If one condition causes students to not complete their work and fail to do the problems on the posttest-section, that in and of itself is a useful dependent measure. You may find a condition that causes some students to not complete their homework, but for those who do finish, the effect is big enough to still show that if you do finish it is a better condition to be in. We call this a “tough-love” condition, it causes a student to quit but if they don’t quit it is significantly better.

4. Sequencing

Another potential threat to the validity is caused by sequencing effects to the fact that students will be exposed to a repeated series of experiments. A potential threat could happen if in study #2 you detect an effect of conditions but that is really due to the fact that all (or most) of the subjects in the winning condition were also assigned to the same conditions in some previous study #1. While this is a potential threat, separate independent randomization in study #1 and study #2 should prevent this, but we can put into place automatic blocking, that when randomization is done for study #2 we block to make sure there is an equal number of students in each conditions in study #1 assigned into an equal number of conditions in study #2. Of course, study one results in an effect we will be sure the tell the experimenter of study #2 so they can start to take that into account. But again, not knowing about the effect of study #1 will just increase variance, making it harder to detect difference but not threaten validity if a finding is found.

5. Novelty effects

Page updated

Google Sites

Report abuse