Modified student evaluations

Student evaluations of teaching are biased! This is a well-established finding in the literature. Recently, two studies shed some light on the possible solutions to reduce gender bias. Altered language used in the evaluations to reduce the bias or making the students very aware of the biases in the past seems to have an effect. The bias for evaluations of women is reduced without any influence on the evaluation of male faculty. However, more research is needed to see whether these results can be established on a larger scale.

... And to end on a lighter note, maybe universities should pay for cookies to be distributed by female faculty: if nothing else helps, distribute free cookies does. Although effects are small (but significant), they have a positive effect on the evaluations.


Boring, A., & Philippe, A. (2019). Reducing discrimination in the field: Evidence from an awareness raising intervention targeting gender biases in student evaluations of teaching. Journal of Public Economics, 193, 104323.

Field experiment at a French university assessing the impact of two different interventions designed to reduce gender biases in student evaluations of teaching (n=4494). In the first intervention, students received a normative statement by email, essentially reminding them that they should not discriminate in the evaluation. In the second intervention, the normative statement was augmented with precise information on how other students in the exact same situation had discriminated against female teachers in the past. To test the impact of these messages, the authors take advantage of the existence of seven different campuses in the university to create a difference-in-differences setting. The students of two campuses were considered as controls, three other campuses were treated with the normative message, and the two remaining campuses were treated with the informative message. The emails were sent after some students had already completed their evaluations. This design provides the authors with a pre-treatment period for all campuses. Finally, the emails were sent to a random half of the students in each of the treatment campuses. While the pure normative statement had no significant impact on evaluations, the informative statement appears to have reduced gender biases against female teachers. This effect mainly comes from a change in male students’ evaluation of female teachers. The spillover effect within campus is extremely high, especially in the information treatment campuses. One of the reasons why the informative treatment may have worked was because it sparked discussions on gender discrimination among students, as anecdotal evidence suggests.

The emails sent to students:

Treatment 1:

"Dear Student, This fall semester’s student evaluations of teaching are open since Monday November 23rd. These evaluations, which are mandatory for students to complete, are read by your instructors and closely analyzed by the Direction des études et de la scolarité in order to prepare the upcoming academic year. Your comments are extremely useful for the administration of Sciences Po in order to improve the quality of our programs, in close collaboration with our teaching staff.

Considering the importance of these evaluations, we would like to remind you that your evaluations must exclusively focus on the quality of the teaching and must not be influenced by criteria such as the instructor’s gender, age or ethnicity. We ask you to pay close attention to these discrimination issues when completing your student evaluations. The goal is to avoid a situation in which, for instance, gender-based biases or stereotypes would systematically generate lower evaluations for women instructors compared to their male colleagues. Best regards, "

Treatment 2:

"Dear Student, In this period of student evaluations of teaching (SET), we would like to bring your attention to the results of a recent study which suggests the existence of gender biases against female instructors of first year undergraduate seminars (i.e. the conférences de méthode) for all fundamental courses.

Indeed, the results of this study show that students tend to give lower ratings to their female instructors despite the fact that students perform equally well on final exams, whether their seminar instructor was a man or a woman. Male students in particular tend to rate male instructors higher in their student evaluations, although a slight bias by female students also exists. The differences in SET scores do not appear to be justified by other measures of teaching quality, such as an instructor’s ability to make their students succeed on their final exams.

Let’s take the example of students whose seminar average grade is 13.5 and the final exam grade is 12 (these grades correspond to the student averages observed during the period 2008-2013, pooling all fundamental courses together). Given these students, female seminar instructors have a 30% chance of obtaining an “excellent” overall satisfaction score, from both male and female students (and keeping constant course characteristics, such as the day and time of class).

Given these grades, however, male instructors have a 33% of obtaining an “excellent” overall satisfaction score when evaluated by a female student and even a 42% chance when evaluated by a male student. These results mean that given an equal performance on exams, female instructors are 19% less likely to obtain “excellent” overall satisfaction scores compared to male instructors (taking into account the proportion of male and female students). These differences are statistically significant.

Furthermore, male students systematically rate male instructors higher, no matter students’ results on final exams, as shown in the graph below.

Finally, the results of this study suggest that students apply gender stereotypes in the way they respond to more specific questions, such as an instructor’s class leadership/quality of animation skills or the ability to contribute to students’ intellectual development. Given these results, we would like to remind you that your evaluations must exclusively focus on the quality of the teaching and must not be influenced by criteria such as the instructor’s gender, age or ethnicity. We ask you to pay close attention to these discrimination issues when completing your student evaluations. The goal is to avoid a situation in which, for instance, gender-based biases or stereotypes would systematically generate lower evaluations for women instructors compared to their male colleagues. Best regards, "

Peterson, D. A., Biederman, L. A., Andersen, D., Ditonto, T. M., & Roe, K. (2019). Mitigating gender bias in student evaluations of teaching. PloS one, 14(5), e0216241.

Field experiment at a U.S. university testing with about 250 student evaluations of teaching whether evaluation forms with language intended to reduce gender bias, can reduce the gender bias. In four classes with large enrollments, two taught by male instructors and two taught by female instructors, students were randomly assigned to either receive the standard evaluation instrument or the same instrument with language intended to reduce gender bias. Students in the anti-bias language condition had significantly higher rankings of female instructors than students in the standard treatment. There were no differences between treatment groups for male instructors. These results indicate that a relatively simple intervention in language can potentially mitigate gender bias in student evaluation of teaching.

The added language was:

“Student evaluations of teaching play an important role in the review of faculty. Your opinions influence the review of instructors that takes place every year. Iowa State University recognizes that student evaluations of teaching are often influenced by students’ unconscious and unintentional biases about the race and gender of the instructor. Women and instructors of color are systematically rated lower in their teaching evaluations than white men, even when there are no actual differences in the instruction or in what students have learned.

As you fill out the course evaluation please keep this in mind and make an effort to resist stereotypes about professors. Focus on your opinions about the content of the course (the assignments, the textbook, the in-class material) and not unrelated matters (the instructor’s appearance).”

Hessler, M., Pöpping, D. M., Hollstein, H., Ohlenburg, H., Arnemann, P. H., Massoth, C., Seidel, L.M., Zarbock, A., . & Wenk, M. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical education, 52(10), 1064-1072.

Field experiment at a German university testing if student evaluations of teaching (n=112) can be influenced by the free distribution of chocolate cookies. The randomized controlled trial was taking place in an emergency medicine course. Participants were randomly allocated into 20 groups, 10 of which had free access to 500 g of chocolate cookies (cookie group) and 10 of which did not (control group). All groups were taught by the same teachers. Educational content and course material were the same for both groups. After the course, all students were asked to complete a 38-question evaluation form.

The provision of chocolate cookies had a significant effect on course evaluation. These findings question the validity of evaluations and their use in making widespread decisions within a faculty.

The cookie group evaluated teachers significantly better than the control group (113.4± 4.9 versus 109.2± 7.3; p=0.001, effect size 0.68). Course material was considered better(10.1±2.3 versus 8.4±2.8; p=0.001, effect size 0.66) and summation scores evaluating the course overall were significantly higher(224.5±12.5 versus 217.2±16.1; p=0.008,effect size 0.51) in the cookie group.