05.   OSCE reliability

Not everything that can be counted counts,

and not everything that counts can be counted.    

- Albert Einstein


OSCE reliability

Anshu: How does one ensure reliable marking in a GOSPE/ GOSCE? Does anyone have any idea?

I think it dilutes the very purpose with which OSCE was started.

 Another issue is that when the number of observers/ rater is more, the variation in marking will also be unreliable. Training is a solution. I also read that when observers 'know' the candidates (quite possible as you have to select so many of them) they tend to be biased in marking. True or false? I haven't used OSCE except in experimental sessions.

Is anyone using OSCE regularly... Barathi? Ciraj?

 Dr Tejinder Singh: We have been using OSCE for last 19 years now. Your apprehension is accepted that there could be some examiner bias. However, remember that this bias is operating in traditional practical also. In a practical, there is only 1 examiner- may be 2- so the damage is higher. In OSCE, each observer has very small number of marks to his/her disposal and hence unlikely to make an impact to the overall result. Sometimes, examiners have their own idiosyncrasies but since each student is going through each examiner- (unlike conventional practical, where chance plays a role on which examiner you get), even those are neutralized.

Examiner training should play a role in improving the reliability of OSCE. However, the major source of un-reliability in OSCE is not the examiner or quality of check lists. It is the problem of poor sampling and inadequate time allotted, which does not allow all abilities to be tested. Before commenting any further on the reliability of OSCE, I will like some of you to comment on the reliability table in the attached slide.

Reliability table

Instrument  1 hour  2 hours  4 hours  8 hours 
MCQ  0.62 0.76 0.93 0.93
Orals  0.5 0.69 0.82 0.9
Long case  0.6 0.75 0.86 0.9
OSCE  0.54 0.69 0.82 0.9
Min CEX  0.73 0.84 0.92 0.96
PMPs  0.36 0.53 0.69 0.82

Barathi Subramaniam :I do understand your concern but don’t you think that this bias is considerably low compare to the traditional exams as lot of objectivity is involved and as TS sir aptly points out the damage caused by the examiners and the quantum of marks available at his/her disposal has to be borne in mind. I am not very much convinced with GOSCE and as you lament the very basis of OSCE/OSPE will be shattered

TS sir the Reliability table has taken my cortex off. I am

Dr Tejinder Singh : Any one else willing to offer some comments on the table?

There has been a lot of change in our understanding of OSCE in last 20 years. The voluminous research which has accumulated has shown that reliability of OSCE is not related to its structure or check lists- rather, it is the effect of wider sampling and inclusion of multiple competencies in the assessment process. There is a definite shift in the way OSCE is planned. From more emphasis on check lists, people are now moving towards global ratings, which have been shown to be as reliable.

Often, we face this dilemma, because we have not clearly answered two questions before planning for assessment and these questions are- valid for what? and reliable for what? If you are looking at OSCE as a means to assess if the student can record BP correctly, then the dynamics are very different from looking OSCE as a means to certify the clinical competence.

Also, we have to clearly distinguish between objective and reliable. Are they the same or do they represent something different?

 Any thoughts?

Anurag:I think the observer’s bias can compromise the reliability.

Suman Singh: Reliability means the ability of two or more observers to examine the same student and arrive at a similar judgment within predefined bounds concerning the quality of knowledge, skill or any other domain to be evaluated.

Objective means the ability to perceive or describe something without being influenced by personal emotions or prejudices.

This means Objective and reliable are two different concepts altogether.
If we want to compare the two for evaluating our student a test can be objective but not necessarily reliable.

eg MCQ is a objective way of assessing students knowledge in a particular subject but is not reliable when it comes to know the understanding of the student on that subject.

Please guide me if I am thinking wrong.

Barathi Subramaniam :I have just read a chapter on "Performance Assessment" by M. Marks, S. Humphrey-Murto in the book titled A Practical guide for medical
teachers :edited by John A. Dent, Ronald M Harden
In this the author says "There is little disagreement that an OSCE
style examination provides a more valid assessment of clinical skills
than a written or oral examination. With training of those involved in
conducting and presenting the examination, the reliability of OSCEs in
terms of standardized patient portrayal, inter-rater agreement, exam
reliability and standardization across multiple testing sites, has been
shown to be acceptable.

I hope this will clarify Anshus concern.

Slowly gaining my orientation after seeing TS sirs PPT table on reliability:
Test-retest reliability measures the consistency of an examination over time.
One problem is deciding on the appropriate time period between the two
Reliability can be increased with large number of stations and
examinees (Petrusa 2002)

20 stations is needed to obtain the minimum reliability (Shumway & Harden 2003)
12 stations is used in Canada without significant reduction in test
reliability (D Blackmore, personal communication 2004)

A 25 station over 8 hours may provide excellent reliability and
validity, but is not realistic.10-12 stations is reasonable with the length
of station 8 min with 2 min for feedback and 1 minute to move to the
next station(Marks and Humphrey )

In Dundee (25-35) 4 or 5 min stations are norm.

Stewart: There is a 3rd edition of the book by Dent and Harden in preparation, should be out soon.

Dr Tejinder Singh: BTW, the book by Dent and Harden is a highly recommended reading. If you do not have one, order it today from Amazon!


Sita: I do not know what PMP stands for in that table. Can you please explain

Dr Tejinder Singh : Patient management problem

Suman Singh: PMP stands for patient management problem this is an important component of clinical evaluation where a case is given followed by a series of question.
Maybe this small abstract will help you. I am also attaching an article on PMP.

The Patient Management Problem as an Evaluative Instrument
Victor C. Vaughan III MD1

1 Senior Fellow in Medical Evaluation, National Board of Medical Examiners and Professor of Pediatrics, Temple University

The patient management problem (PMP), a device increasingly used for assessment of medical competence, has been under active development for a number of years and responds to the concern that more traditional techniques for objective evaluation, such as use of the multiple-choice question (MCQ), are often restricted in their content or scope. Though they can reliably test what is known about various aspects of health and illness, they commonly fail to evaluate realistically the process of health care.

The PMP attempts to put the student or physician (the 'test-taker') figuratively into a setting recognizable as belonging to real life, and within that setting (where specified resources are available) presents a clinical problem for solution or management. Given a clearly stated problem in a defined setting, the test-taker is asked to choose among a variety of alternatives for action, some of which may be appropriate, others either not appropriate or even contraindicated. In contrast to the MCQ, which would simply have the choices scored as correct or incorrect, the PMP not only records such scores but, in addition, gives the test-taker the results of the actions selected, usually through development of a latent image embodied in invisible ink. For example, if a blood test is selected as appropriate, the test-taker who follows instructions to develop a latent image placed opposite the choice, will see the results of the test appear.

Praveen: What does this 1hr, 2hr, 4hr, 8hr in the upper row indicate?

Anshu: Time taken for the test. If the test is 8 hours long the reliability improves

Praveen : Thanks for making it clear, In that case as you mentioned as the test duration increases the reliability improves and also we can see that amongst the various instruments which are being used for evaluation, Mini CEX is best followed by MCQ and then the other methods.

Dr Tejinder Singh: Thank you for some of the comments. This table presents some very useful information regarding practice of assessment. Look at some of the issues-
1. Reliability of one hour OSCE and case presentation is same. In other words, it suggests that reliability of OSCE is not dependent on its check lists and structure. If it were so, it would have been more reliable than a case presentation.

2. The longer time you give to the student, the better is the reliability. Figures for 8 hour viva, case presentation and OSCE are almost same.

3. The higher time alloted denotes including more areas and competencies in the assessment process. The major source of unreliability in assessment (even more than erratic examiners) is the content specificity. A student who does well on a CNS case is not necessary equally good on a heart case.

4. There is nothing like 'the reliable' method.

Almost any method can be reliable if you put enough time and effort into it.




