Introduction
Assessment is indispensable for any pedagogical setting to gauge student’s learning and pinpoint gaps in their knowledge. An effective assessment tool is the one that aligns with the learning goals and affords additional learning experience for students. Otherwise, assessment is deemed a superfluous and burdensome task (1). Assessment tools could be as simple as one-minute paper to a more sophisticated semester-long project presentation. The instructor should select the reliable, accessible, and reproducible assessment tool fitting the class context. Undoubtedly, before selecting the assessment tool, the instructor should preemptively determine the educational outcomes and learning objectives. Without this initial crucial step, the assessment will become a merely perfunctory routine for both students and instructors without gaining more knowledge or honing students’ skills.
Peer assessment has garnered attention since the 1990s (2, 3). The underpinning of peer assessment is allowing students to evaluate their peers' performance in a certain assignment. The benefit of this approach is multifaceted (4). First, it strengthens students’ cognitive domain through revising and utilizing their knowledge and critical thinking skills to evaluate the knowledge of their peers effectively. Second, it sparks students’ metacognitive domain by reflecting on and evaluating their own performance while evaluating their peers. Peer assessment has shown a profound positive effect on students’ understanding of the class materials (5, 6). Importantly, this approach promotes the students to practice, vicariously, the peer review process that they will encounter throughout their career. Suffice it to say that any STEM-related career, to a variable degree, hinges on receiving feedback and giving feedback to colleagues. Moreover, peer assessment is a form of peer instruction, which students prefer rather than traditional teacher-centered instruction (7, 8). Presumably, students feel that their peers are at the same level, sharing the same struggles and the same academic goal.
Problem-based learning (PBL) has been at the vanguard for student-centered teaching for years (3) . It has been applied largely in medical education (9). Considerable amount of literature has shown that PBL equips medical students with clinical reasoning, self-directed learning, reflective education, and critical thinking (5, 6). PBL in medical education can be applied in various approaches, including case studies and playing the role of patient vs clinician.
To derive the utmost benefit of PBL in medical education, peer assessment instrument has been applied. Peer assessment has been shown to be more favorable to students and promote heightened attention and self-reflection. A study conducted in 2007 demonstrated that the correlation between peer assessment with tutor-based assessment improved over the 24 weeks of the course of first-year medical students (10). This correlation implicated the accuracy of peer assessment as a tool to evaluate students’ performance and comprehension. In addition, the study highlights the overall satisfaction among students with respect to peer assessment as opposed to self-assessment. An older study investigated the effectiveness of peer assessment in PBL activity for college students (11). Broadly speaking, students felt confident and responsible towards their peers and tutors. In addition, they endorsed the benefit they gained by having the opportunity to evaluate the performance of peers and, concomitantly, reflecting on their own (11). As expected, this tool was not welcomed by all students. A major critique was the lack of objectivity and facile evaluation (11). Notably, not all students felt the predilection to give negative feedback to their peers. Rather, they contended that they lack the power to adjudicate the performance of their peers without ensuing discussions. This study spurred further research to improve the reliability of peer assessment in different contexts.
In fact, choosing the adequate assessment tools to evaluate PBL remains elusive. A myriad of studies has been conducted to test the effectiveness of both self- and peer assessment using different ways (1, 10-12). Nonetheless, low reliability and poor correlation with tutor-based assessment forestalled the success rate of peer assessment. Furthermore, PBL targets a litany of discipline-related skills such as defining and analyzing a problem, ability to acquire knowledge outside the groups, successful integration of the acquired knowledge, responsibility toward other group's members, and testing hypothesis. Hence, the challenge the traditional assessment faces is how to determine unequivocally whether students acquire those skills. Traditionally, tutors would rely on asking open-ended questions, which will not provide a satisfactory chance for students to show their learning. Therefore, more research is required to refine peer assessment in PBL in different contexts of STEM teaching. A more robust and reliable approach for assessment is exigently needed to better evaluate PBL activities. Using a rubric represents a better path to assess students’ performance in tutorial sessions or at the end of the course in a more objective manner (13). Another appeal of creating a predefined rubric as an assessment guide is that it enables assessors in general to reflect carefully on students’ performance in specific areas. Furthermore, rubrics are malleable and can fit into disparate assessment modalities.
In the current project, I examined whether peer assessment improves the performance of first-year medical students in a PBL activity. To this end, I worked closely with my doctoral mentor, Dr. Irving Vega, who is the coordinator for the specialized intersession course “Molecular Neuropathology of Neurodegenerative Diseases.” This course was offered for first-year medical students over four weeks. Since the course adopted a flipped-classroom approach, the class time focused mainly on collaborative PBL activities wherein students applied the knowledge they acquired prior the class. Covid-19 restrictions were still in place, so the course was virtual. Students had been distributed to nine groups. Every week, each group developed a case study. Then, each group will be assigned randomly to one of those nine cases to solve, present to class, and answer pertinent questions. Peer assessment was conducted on the day of presentation, followed by peer evaluation in different areas. To enhance the objectivity and robustness of assessment, I developed a rubric that was shared with students before the beginning of the class. Because the class was virtual, students had to submit their assessment through Qualtrics. To enhance assessment fidelity, faculty and teaching assistants had to submit their assessment as well. Each group received only their peer assessment at the end of each week. Then, they were asked to respond to the feedback. Data were collected weekly, and after the end of the course, the performance of each group was analyzed quantitatively over the four weeks. In this short-term project, students were trained on receiving feedback from their peers and promoted to use that feedback as a guide to cultivate their skills.
Methods
Student cohort
Forty-eight first-year medical students enrolled in the intersession course “Molecular Neuropathology of Neurodegenerative Diseases”. Three classes were given every week for 4 weeks. The course was offered as a flipped classroom module. Hence, students had access to class materials and assignments before class time, which was primarily student-centered collaborative PBL interspersed with a modicum of lecturing.
Students were divided into 9 groups, where each group comprised 4-5 students. In the beginning, I had planned to switch students between groups every week to minimize the chance of interpersonal conflict. However, after the first week, I realized that the data will be more conclusive if the groups did not change. Therefore, I let the students express their preference through a survey sent at the end of the first week. The survey results endorsed the overwhelming desire of students to stay with their group members during the entire course.
Peer assessment
Peer assessment was employed by developing a rubric as an assessment guide whereby students evaluated their peers in a PBL activity (clinical simulation; described below). Recently, rubric-based assessment has become a workhorse in the classroom as it provides the yardstick for instructors to gauge students’ progress. Rubrics connote different meanings according to its use (14). Herein, rubric represents the document that includes the assessment criteria according to which students will be evaluated. In other words, it specified the expectations by clarifying what areas students will be assessed for. The rubric was then utilized to create the assessment survey. The document Survey 1 illustrates the rubric (and the survey) developed for the case simulation activity.
The rubric was modified from previously published evaluative questions for medical students, thereby the rubric aligns with academic level and course content (1, 15). Furthermore, the rubric was reviewed by the course coordinator Dr. Irving Vega to ensure that the assessment is in accordance with the course learning objectives. It is important to note that the rubric was shared with students before the class began to set the expectations and prepare the students for the activity.
Every week, each group of students was assigned with a case simulation activity. In this activity, each group would, collaboratively, develop a clinical case that is relevant to the respective week’s theme of neurodegenerative disorders. Students could develop the cases through a scaffolding patient profile sent to them at the beginning of each week. All groups were given three days to develop the cases. Then, those cases were randomly distributed to the groups (excluding the group that created the case). Each group should solve the case and deduce the diagnosis with all germane details. Lastly, at the end of the week, each group presented the case and their diagnosis and answered questions from the instructors. During the presentation, a link to the peer assessment survey in Qualtrics was shared through Zoom. Each group was evaluated by the rest of the class and instructors. The assessors were given 5 minutes between group presentations to submit their feedback. The survey was only available for responses during class time, and it was closed afterwards. Therefore, late responses were excluded to avoid the introduction of potential inaccuracies.
Later, each group received a summary of peers’ feedback. Notably, instructors’ assessment was not revealed to students; rather, it was used for data analysis on the reliability of peer assessment. Furthermore, each group had the chance to respond to the feedback they received through a survey (Survey 2), which was also developed on Qualtrics. Scores were collected anonymously to avoid the chance of bias imposed by the data analysis.
Qualtrics offered a convenient means to create the assessment survey because it is available through MSU without additional charge, and it is suitable for the virtual nature of the class. Importantly, data collection was simple through Qualtrics.
End-of-course evaluation
As part of an end-of-course assessment that is conducted every year, a survey was developed to gather students’ feedback on the peer assessment activity they participated in. Table 1 and 2 demonstrate the survey questions articulated to derive students’ impression on the peer assessment and receiving feedback, respectively.
Data analysis
Rubric-based peer and instructor assessment scores were analyzed to track the temporal change in student performance over the 4 weeks within the same group. In this study, comparisons among groups were not conducted for that will not address the main question of the study—the impinge of peer assessment on student performance during the course. Graph Pad prism v.9.5 was used for data analysis.
Since the number of students submitting their assessment scores was variable from week to week, repeated measures ANOVA test was infeasible. Hence, One-way ANOVA with mixed effects analysis was conducted to analyze peer assessment scores because it is an adequate surrogate of repeated measure with missing values. For pairwise comparison, Tukey’s post hoc test was used. P<0.05 was set to be the significance level.
In regard to instructor assessment scores, they were computed to examine the extent of closeness or divergence with respect to peer assessment. It should not escape our attention that each week was directed by a different group of faculties; therefore, repeated measures ANOVA was not adequate for analysis. Due to the small and variable number of instructor scores, the data failed the normality test. Therefore, Kruskal Wallis was implemented for data analysis.
Direct comparison between instructor and peer assessment over time was not practical due to the small number of instructor scores (and missing values), which invalidates the requirement to conduct Two-Way repeated measures ANOVA. Likewise, correlation could not be conducted due to the unmatched number of assessors in instructors and students groups. Hence, I opted for visual comparison and contrast between instructor and peer assessment.
Weekly students’ response to their peer’s assessment was summarized using descriptive statistics. Likewise, descriptive statistics were utilized to represent the results of end-of-course evaluation for the activity.
Results
Student’s performance changed over time
As delineated in the methods section, peer assessment was tested in a case simulation activity, which was a PBL developed by the course coordinators. Each group articulated a case study that focused on one of the neurological disorders taught in the course. These cases were distributed randomly to other groups to solve and present to the class. Peer assessment was implemented on case presentation. Scores collected for each group over the 4-week period of the course were analyzed and compared to week one, which served as the baseline of students' performance for this activity.
Fig. 1 illustrates the temporal change of student’s performance (reflected as peer assessment scores) over time compared to week one in all groups. The figure shows the striking correspondence of the eight groups and the noticeable deviation of group five. All groups except group five showed a jump in the scores in week two activity. Nevertheless, the scores dropped in weeks three and four non-significantly as seen in Fig 1, A-D, F and G, where they almost reach week one values. Groups seven and nine showed a significant reduction in the scores in week three compared to week one (Fig 1, G and I).
Group 5 showed a non-significant change in peer assessment scores during the course. Rather, this group demonstrated consistent attrition from week one to week three with a slight increase in week four (Fig 1E).
In summary, the results demonstrate that across the majority of groups, a prominent increase in peer assessment scores was observed in week two followed by a reduction in weeks three and four. This subsequent reduction in the scores could be attributed to other factors. For example, as the course advanced, students may have been overwhelmed by the workload. In addition, the case studies are variable during the course because it covers different topics. Therefore, the difficulty level of the cases might potentially be different. Moreover, students’ enthusiasm for peer assessment might have been abated after the second week. Another plausible explanation is that students might have mastered using the rubric as a guide to assess their peers in subsequent weeks. These factors individually or collectively might explain the drop in students' assessment scores in the last two weeks.
Instructor assessment verified peer assessment results
Instructors (faculty and teaching assistants) used the same rubric to evaluate the group performance in the case simulation activity. Similar to peer assessment, instructors submitted their evaluation according to the same rubric on Qualtrics immediately after each group’s presentation. It is noteworthy that the survey started with a question for the assessor to indicate whether they are an instructor or a student. Undoubtedly, a possible error exists if the assessor mistakenly hit the incorrect answer, which could potentially obscure the veritable data interpretation. Nonetheless, the only way to avoid this improbable error was to create two separate surveys for instructors and students. Indeed, that would have been more challenging, and it bears a higher possibility of errors.
Instructor assessment scores were computed and compared to week one as described earlier. Broadly speaking, instructor assessment bears an intrinsic similarity to peer assessment (comparing Fig. 1 and 2). All panels of Fig 2 (except panel E) display the increased scores in week two with a subsequent drop in weeks three and four compared to week one. In contrast, as observed from Fig. 2E, group five showed a consistent drop in scores after week one. Their scores in week three were significantly lower than week one with a slight rise in week four.
The results signify that instructor assessment nearly coincided with peer assessment. As discussed earlier, the drop in students’ performance (reflected as scores) in weeks three and four could be due to other factors that might or might not relate to the case simulation activity.
Students showed a positive attitude towards the feedback
To examine students’ response to the feedback received, each group was asked to submit their impression of the critique they received from their peers using another survey (Methods section). This activity aims at training the students to reflect upon the feedback and objectively analyze the critique to benefit from it in improving their performance. In that way, their metacognitive domain is challenged to enhance their learning experience.
Not only did I analyze students’ responses to the Likert scale, but I also perused their written response. Notably, students felt more competent and expressive in writing their responses rather than choosing agree vs disagree on a scale.
Table 3. summarizes students’ responses to peer assessment across the 4 weeks. We can observe from the table that the vast majority of students agree that the peer assessment was constructive, raised valid issues, and fair. It is clear that the number of students participated was variable with the maximum in week two. Interestingly, most of the groups showed the highest performance in week two (Fig 1 and 2). That might convey that receiving feedback in week one potentially exhorted students to perform better in week two. Furthermore, the positive feedback at the end of week two incentivized most students to respond to the feedback (comparing number of students in week one and two).
In addition to the Likert scale, students had the opportunity to write their impression of the peer assessment. In general, most of the students’ responses endorsed their appreciation for the feedback. Most students showed evolving competence in responding to critique. I noticed that in week one, most of the students’ responses were defensive. Later during the course, students developed a noticeably broad-minded and acquiescent attitude towards the feedback and willingness to improve. Indeed, that was the main objective of this exercise—to train students to accept the feedback and impartially seek ways to improve. A prevailing concern raised by students was the time constraint for the presentations that might have precluded them from showing their full potential. In addition, few responses mentioned “survey fatigue” as a downside of this activity, which will be covered more in the Discussion section.
End-of-course feedback on peer assessment exercise
As part of the regular end-of-course survey, the evaluation for the peer assessment activity was included. Students evaluated two parts of the activity: the peer assessment for case presentation and the response to feedback. It is important to note that 43 out of the 48 students submitted their feedback on the course.
Table 4 summarizes students’ evaluation for the peer assessment part. We can notice that more than 50% of students agreed that the rubric was clear. Likewise, more than half of students agreed that standardized rubric made the assessment more objective. The majority of students prefer assessing their peers using a survey rather than writing. In the last two points, roughly half of students agreed that peer assessment enhances their skills of providing feedback and reflecting on their own performance. However, 13 and 10 out of 43 responded to neither of those two points, respectively. That suggests that still a population of students did not see tangible benefits of this exercise in boosting their skills.
Table 5 illustrates students’ impression of receiving the feedback. What indeed stands out is that almost half (26) of students preferred to receive the feedback from faculty rather than their peers. Thirteen students responded "unsure" to this question. Furthermore, almost half of students admitted they do not take the feedback personally. Importantly, 14 students did not perceive the peer assessment process as a valuable learning process; however, half of students see otherwise. The reason presumably could be ascribed to the survey fatigue that many students alluded to in the written section of the survey. Several students reported that the assessment was cumbersome and exhausting, especially with the limited time allotted for the presentations.
Fig. 1. Peer assessment scores change over four weeks. Each panel represnt each of the nine groups during the four weeks of the course. Peer assessment scores are shown on the Y-axis. One-Way ANOVA with mixed effect analysis was conducted to determine the change in peer assessment scores for each group over time compared to week one, which is the baseline. *p <0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001
Fig. 2 Instructor assessment scores coincide with peer assessment scores. Each panel represnt each of the nine groups during the four weeks of the course. Instructor assessment scores are shown on the Y-axis. Non-parametric Kruskal Wallis was conducted to determine the change in instructor assessment scores for each group over time compared to week one, which is the baseline. *p <0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001
Discussion
PBL assessment has been a major challenge in STEM education for a long time. Several studies have investigated the viability of using rubrics and questionnaires for tutor assessment along with self- and peer assessment (1, 10, 16). Evidently, rubrics-based assessment provided tutors and peers a constructive tool to monitor the improvement in students’ achievement of desired learning goals over time. Using rubric-based assessment for long-term courses showed more consistent educational benefits. With semester-long projects and multiple evaluations on a single project, students will have enough time to reflect on the feedback and avail from it in subsequent activities. On the other hand, short-term courses showed discrepant results where some showed improvement in students’ performance, whereas others did not report a significant change (3).
In this project, I attempted to investigate the impact of peer assessment on students’ performance in a PBL activity for first-year medical students. Herein, peer assessment was implemented according to a rubric to facilitate the evaluation process. Collectively, the data show that students’ performance (reflected in scores) in this PBL did not change in week three or four compared to week one. However, there is a clear rise in students’ performance in week two in all groups except group five. Variable cases, increased workload, abated student’s motivation and dedication for the activity might all have contributed to the observed undulation in assessment scores over the four weeks. With these unavoidable factors, the results provide insufficient evidence for the benefit of peer assessment in improving students’ performance in PBL activities. Furthermore, instructor assessment scores were collected concomitantly and analyzed separately. Although neither direct comparison nor statistical correlation was adequate due to the difference in numbers between instructors and students, the results showed that peer assessment is coterminous with instructor assessment (Fig. 1 and 2).
Qualitative analysis of students’ responses to their peers’ assessment revealed a generally positive attitude towards the critiques they received. Indeed, student’s responses showed their appreciation for the value and benefit of peer assessment. Likewise, the end-of-course survey indicated that the majority of students benefited from the activity despite the concerns they raised, which will be discussed later.
It is noteworthy that group five, whose peer and instructor assessment scores showed visible decline compared to the other eight groups, encountered internal dissension within the group. This observation plausibly indicates that the absence of harmonious and collaborative stance among group members could impede the group’s progress.
Limitations
As stated by students in the end-of-course evaluation survey, survey fatigue had been a prevailing concern, which rendered scoring unreliable over time. Survey fatigue stemmed mostly from the fact that each student had to answer survey questions nine times for about three hours every week. To rectify this issue, a few students suggested having each group to evaluate only one group every week. In the future, this suggestion should be considered while modifying the peer assessment process. Additionally, students' attention and motivation would fluctuate throughout the three hours of case presentation. In other words, students were, probably, at their highest level of motivation in the morning with the first group's presentation; then, their attention and passion would wane. To avoid this problem, the order of group presentation was shuffled every week to ensure that each group was scored fairly over the 4 weeks. Moreover, I noticed that some students would submit their evaluation before the presentation starts. That discredits the assessment. Presumably, some students perceived this activity as a cumbersome task and not a learning experience.
Another limitation here is the time restraint of the course. The limited time necessitated conducting this assessment activity every week. Undoubtedly, that was tiresome and mundane for students. Future semester-long courses offer a better opportunity to test the effectiveness of peer assessment on long-term students’ performance. Likewise, time restraint of each case presentation did not allow students to engage in thorough discussion and ask questions to better assess their peers. In the same vein, the faculty would not find the time to ask the presenters about their case. Therefore, this area in the rubric was left imponderable.
We, as human beings, bear the entrenched instinct of seeking appraisal from surrounding people, whether colleagues, employers, or family members (17). Indeed, friendship and affability play a non-negligent factor in scoring peers. Therefore, students might not lean towards giving poor feedback to their peers. In other words, even if the performance is subpar, students would balk at giving low scoring. That could demystify the apparent discrepancy I noticed sometimes between the scores and the written feedback. In particular, some assessors would give maximum scores for a group despite differences in their written critique. Alternatively, students might not trust the rubric system or feel bemused by all the definitions and sentences that they need to think about carefully before giving a score.
After all, PBL remains a reliable student-centered pedagogical tool that aims at enhancing the ability of students to internalize the knowledge. PBL boosts several skills as searching for the information, collaborative working, and analyzing and criticizing the information. However, the assessment of PBL remains a challenge in higher educational system (18). Using impartial objective tools to assess students’ performance in PBL merits further testing. In this small-scale mentored project, I opted to use peer assessment with a rubric for PBL. The results indicate that this method should be further investigated in larger classes. It has been shown that self-assessment succeeds in improving a short-term task (maximum a weeklong) (5). That could be extrapolated to peer assessment as well. The results herein show a spike in students’ performance (except group five) in the second week. That is in accord with what has been published before (5). Notably, getting feedback only from their peers will render the students feel they are the center of the educational process with respect to this specific activity (19). In line with the recent Universal Design for Learning (UDL) guidelines, students feel some degree of autonomy by providing them with the context to assess their own skills. Hence, future projects should be directed to test the capacity of peer assessment to cultivate students’ performance in short- and long-term assignments, while addressing issues such as survey fatigue.
References
1. Sim, S. M., Azila, N. M., Lian, L. H., Tan, C. P., andTan, N. H. (2006) A simple instrument for the assessment of student performance in problem-based learning tutorials Ann Acad Med Singap 35, 634-641
2. Pond, K., Ul‐Haq, R., andWade, W. (1995) Peer Review: a Precursor to Peer Assessment Innovations in Education & Training International 32, 314-323
3. Norman, G. R., andSchmidt, H. G. (1992) The psychological basis of problem-based learning: a review of the evidence Acad Med 67, 557-565
4. Eva, K. W., Cunnington, J. P., Reiter, H. I., Keane, D. R., andNorman, G. R. (2004) How can I know what I don't know? Poor self assessment in a well-defined domain Adv Health Sci Educ Theory Pract 9, 211-224
5. Lycke, K. H., Grøttum, P., andStrømsø, H. I. (2006) Student learning strategies, mental models and learning outcomes in problem-based and traditional curricula in medicine Med Teach 28, 717-722
6. Margetson, D. (1994) Current educational reform and the significance of problem-based learning Studies in Higher Education 19, 5-19
7. Tullis, J. G., andGoldstone, R. L. (2020) Why does peer instruction benefit student learning? Cognitive Research: Principles and Implications 5, 15
8. Crouch, C. H., andMazur, E. (2001) Peer instruction: Ten years of experience and results American journal of physics 69, 970-977
9. Trullàs, J. C., Blay, C., Sarri, E., andPujol, R. (2022) Effectiveness of problem-based learning methodology in undergraduate medical education: a scoping review BMC Medical Education 22, 104
10. Papinczak, T., Young, L., Groves, M., andHaynes, M. (2007) An analysis of peer, self, and tutor assessment in problem-based learning tutorials Med Teach 29, e122-132
11. Sluijsmans, D. M., Moerkerke, G., Van Merrienboer, J. J., andDochy, F. J. (2001) Peer assessment in problem based learning Studies in educational evaluation 27, 153-173
12. Reiter, H. I., Eva, K. W., Hatala, R. M., andNorman, G. R. (2002) Self and peer assessment in tutorials: application of a relative-ranking model Acad Med 77, 1134-1139
13. Valle, R., Petra, L., Martinez-Gonzaez, A., Rojas-Ramirez, J. A., Morales-Lopez, S., andPina-Garza, B. (1999) Assessment of student performance in problem-based learning tutorial sessions Med Educ 33, 818-822
14. Andrade, H., andDu, Y. (2005) Student perspectives on rubric-referenced assessment Practical Assessment, Research and Evaluation 10,
15. O'Brien, C. E., Franks, A. M., andStowe, C. D. (2008) Multiple rubric-based assessments of student case presentations Am J Pharm Educ 72, 58
16. Suryanti, N., andNurhuda, N. (2021) The effect of problem-based learning with an analytical rubric on the development of students’ critical thinking skills International Journal of Instruction 14, 665-684
17. Shrauger, J. S., & Schoeneman, T. J. (1979). Symbolic interactionist view of self-concept: Through the looking glass darkly Psychological bulletin, 86(3), 549.
18. Eva, K. W. (2001) Assessing tutorial-based assessment Adv Health Sci Educ Theory Pract 6, 243-257
19. Panadero, E., Jonsson, A., andStrijbos, J. W. (2016) Scaffolding Self-Regulated Learning Through Self-Assessment and Peer Assessment: Guidelines for Classroom Implementation Enabling Power Asses 4, 311-326