Methods: Papers explaining rubrics of OSCE checklist forms were identified from Pubmed, Embase, PsycINFO, and the ProQuest Education Databases up to 2013. Included were those studies that report empirical validity or reliability values for the communication skills assessment checklists used. Excluded were those papers that did not report reliability or validity.
Results: Papers focusing on generic communication skills, history taking, physician-patient communication, interviewing, negotiating treatment, information giving, empathy and 18 other domains (ICC -0.12-1) were identified. Regarding the validity and reliability of the communication skills checklists, agreement between reviewers was 0.45.
Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving learners are more likely to use. The purpose of this study was to determine if limiting checklist items to clinically discriminating items and/or adding missing evidence-based items improved score reliability in an Internal Medicine residency OSCE. Six internists reviewed the traditional checklists of four OSCE stations classifying items as clinically discriminating or non-discriminating. Two independent reviewers augmented checklists with missing evidence-based items. We used generalizability theory to calculate overall reliability of faculty observer checklist scores from 45 first and second-year residents and predict how many 10-item stations would be required to reach a Phi coefficient of 0.8. Removing clinically non-discriminating items from the traditional checklist did not affect the number of stations (15) required to reach a Phi of 0.8 with 10 items. Focusing the checklist on only evidence-based clinically discriminating items increased test score reliability, needing 11 stations instead of 15 to reach 0.8; adding missing evidence-based clinically discriminating items to the traditional checklist modestly improved reliability (needing 14 instead of 15 stations). Checklists composed of evidence-based clinically discriminating items improved the reliability of checklist scores and reduced the number of stations needed for acceptable reliability. Educators should give preference to evidence-based items over non-evidence-based items when developing OSCE checklists.
Methods: The study analyzed medical students' grades in 11 OSCE exams in the 2022-2023 academic year at Alfaisal University, Riyadh, Saudi Arabia. Students received family medicine clerkship rotations, and after each rotation, they took an OSCE exam consisting of three stations that family medicine consultants graded. The exam included a checklist of 30 tasks and a five-level global rank scale. The study collected all the checklist marks and global rank grades and analyzed them using IBM Statistical Package for Social Sciences (SPSS Statistics) software. The statistical tests used were descriptive statistics, the T-test, chi-square tests, Fisher's exact test, and Pearson correlation.
Results: The study showed that students were more likely to pass when using the global rating system than the checklist scoring system. Additionally, students had a significantly lower passing rate when using the higher cut-off passing score estimated using the borderline regression method compared to the pre-set passing score of 70% established by the university (with a p-value of 0.00).
Two primary methods, checklist evaluation and global rating system evaluation, have been utilized to evaluate performance in OSCE exams [2,3]. While both have shown high effectiveness, they have also exhibited drawbacks and can be unreliable when used independently [4-6].
Borderline regression is a standard-setting method widely used to set standards for large-scale OSCE exams. It plots the actual mark that all candidates received for a station against the global score they were given. A best-fit line is then drawn, and the point where it intersects the "borderline" indicates the cut-off mark for the checklist at that station. While the borderline regression method has better face validity, its efficacy is questioned when applied to small-scale OSCE exams [4,8-9].
In our family medicine clerkship program at King Faisal Specialist Hospital in Riyadh, we always used pre-determined passing checklist scores of 70% for each OSCE station without using global rating scores. The examiners rate the students based on their expected performance and knowledge on the checklist. However, in recent years, some students were observed to have received high scores with the checklist rating that they did not deserve, while other students performed well but did not receive high scores. Qualitative assessment through the incorporation of a global rating scale was thus introduced to evaluate students' performance.
The variations we observed in pass-failure rates while using different cut-off marks, including the university's current 70% mark for the checklist, the global rating score, and the estimated pass score from regression analysis, can be attributed to various reasons [3]. Firstly, the varied difficulty levels of each station may result in varying passing grades with different scoring methods. Secondly, while the global scoring system can provide valuable insights into a student's performance, it is also subjective. Examiners might have different professional opinions about the marks the student's global performance deserves. This can result in variable global rating scores. On the other hand, although the checklist rating system is more objective, the examiner's professional opinion might be completely different about the student's performance.
Additionally, using two examiners to evaluate each station could ensure more accurate assessments. This would involve one examiner evaluating the student's performance based on the checklist while the other assesses overall performance using a global scoring system [9]. The examiners should conduct this evaluation without consulting the student's checklist score to prevent the second examiner from being influenced by the checklist mark when assigning a global rating score.
PURPOSE: To evaluate the effectiveness of binary content checklists in measuring increasing levels of clinical competence. METHOD: Fourteen clinical clerks, 14 family practice residents, and 14 family physicians participated in two 15-minute standardized patient interviews. An examiner rated each participant's performance using a binary content checklist and a global process rating. The participants provided a diagnosis two minutes into and at the end of the interview. RESULTS: On global scales, the experienced clinicians scored significantly better than did the residents and clerks, but on checklists, the experienced clinicians scored significantly worse than did the residents and clerks. Diagnostic accuracy increased for all groups between the two-minute and 15-minute marks without significant differences between the groups. CONCLUSION: These findings are consistent with the hypothesis that binary checklists may not be valid measures of increasing clinical competence.
Obesity is a major public health problem, yet residents undergo little formal training and assessment in obesity-related care. Given the recent growth of telehealth, physicians must further learn to apply these skills using a virtual platform. Therefore, we aimed to develop an objective structured clinical examination (OSCE) with reliable checklists to assess resident ability to take a patient-centered obesity-focused history that was feasible over telehealth based on published obesity competencies for medical education.
We developed a 15-minute telehealth OSCE to simulate an obesity-related encounter for residents modified from a script used to assess medical student obesity competencies. We designed three checklists to assess resident skills in history taking, communication and professionalism during the obesity-related encounter. Resident performance was assessed as the percentage of obesity-related history taking questions asked during the encounter and as the mean communication and professionalism scores on a scale of 1 through 5 with 1 representing unacceptable/offensive behavior and 5 representing excellent skills. Encounters and assessments were completed by two commissioned actors (standardized patients) and 26 internal medicine residents over a secure online platform. We assessed the reliability of each checklist by calculating the percent agreement between standardized patients and the kappa (κ) statistic on each checklist overall and by each checklist item.
Results from this pilot study suggest that our telehealth obesity OSCE and checklists are moderately reliable for assessing key obesity competencies among residents on a virtual platform. Integrating obesity OSCEs and other educational interventions into residency curricula are needed to improve resident ability to take an obesity-focused history.
To assess inter-rater reliability among standardized patients, the percent agreement and kappa statistic (κ) were calculated for each checklist overall and on each checklist item after the post-OSCE debrief. To better understand the factors that contributed to discrepancies between SP ratings on several history taking checklist items, we conducted a post-hoc analysis comparing agreement between SPs on history taking items with less than 75% agreement by the SP performing the encounter. For the communication and professionalism checklist items, agreement was reached if SP assessments were within one Likert scale rating of each other.
f067163c62