General considerations
After or during a course in leprosy or TB the trainees will have developed different levels of competence. It is the responsibility of the training centre to define the minimal acceptable level of the required competence, which the leprosy health worker must attain. The definition of this standard is important because it acts as a first guarantee that the trainees will perform in an acceptable way later in practice. Unfortunately the course leaders can never be certain that good results at the end of a course, means that the health worker also performs well in the community. But it is the obligation of the training institution therefore to show to the community that the minimum acceptable skills and knowledge are mastered at the end of the course. These standards are checked by different types of examinations and the results on these examinations are used to make a decision about each candidate, pass or fail, which means that the candidate has reached the minimum standard or not. The procedure, which is described here, is called assessment. In this chapter we will discuss this process more in detail. What is competence?, what is performance, and what are the most appropriate examination techniques for assessing these performances?.
Competence and performance
The first question "What is competence?" This seems a very trivial question as one may argue that competence is simply the ability to perform the most important tasks in the community, of which the most important is the ability to diagnose and treat an individual patient who is ill. But from the view of training it is important to analyse this simple question very thoroughly. First of all we have to distinguish between competence and performance. These words describe the distinction between capability of doing something (competence) and the actual doing of it (performance). This distinction is quite similar to the properties of a drug. Thus capability is an indicator of the effectiveness of a drug under ideal circumstances with carefully chosen patients, while performance refers to what a drug actually does in less perfect circumstances and a more heterogeneous population of patients. In leprosy, for example one could say that an important competence is the ability to examine, to elicit and to describe hypopigmented patches and thickened nerves. The difference between performance and competence is the way the health worker actually does this in an examination or in practice (performance), and the way he mastered this skill under ideal circumstances during the course (competence).
The most important way of defining competencies is to use task analysis. This is a process whereby the activities that doctors and health-workers engage in are documented and described in such a way as to make explicit the purpose of the activity, the procedures that must be used to perform this activity and the outcomes or expected products of these activities. Decisions about these activities are based on a consensus of the opinions of experts and on the use of activity logs, which document what activities the health workers are engaged in.
After a thorough analysis of the tasks of health workers, the necessary competencies can be divided among the following categories. See table 12 and 13 .
Table 12. A categorisation of competence for Health personnel working in Leprosy and TB
Table 13. A Categorisation of medical competence for Health Personnel
Remark
From these list of competencies, the course objectives will be formulated which can be used to construct the different problems for self-study in a Problem-based course. Notice that problem solving is an important skill that must be assessed in PBL courses.
After the definition of these competencies, the formulation of the different course objectives, it is important to know to what degree the different competencies are mastered. With the help of examinations, information can be collected about the level of competence of each health worker, so the question is now what kinds of examinations are appropriate to assess the health workers and what is the role of the examiner exactly?.
The role of the examiner
The ultimate task of an examiner is to assess the competence of a student during or at the end of a course. Sometimes the examiner makes all the decisions about the procedure himself. The way he examines, the kind of questions he raises to the student and the decision whether a student fails or passes. But in other situations the examiners shares this responsibility with other examiners.
The impact on the result of an examination can be different in these both ways of assessing. An example will help. In a traditional oral examination the examiner has a lot of freedom to raise all kinds of questions and also to observe whether the candidate is able to answer the question , but also how the candidate talks, acts, behaves and thinks. In an MCQ examination however all the questions and their correct answers are formulated in advance. This should be done by all the examiners who submit their own questions and by all agreeing whether each question and its component parts are fair and useful. So in the oral the examiner has freedom to raise questions and to judge, but in the MCQ this freedom is confined to the consensus of his colleagues before the examination. The MCQ is called more objective since there is a bigger guarantee that each question will focus on the content of the subject: there is therefore no place for the subjective views and particular likes and dislikes of any examiner.
Is the difference of the role of an examiner in an oral examination and in a MCQ examination important? The answer must be Yes. All kinds of studies have shown that traditional orals are not very objective which means that different examiners who see the same candidate have completely different judgements. Passing or failing an oral examination therefore is a kind of lottery-or the throw of a dice! You are lucky or not and it does not depend primarily on your competence but rather on the subjective judgement of your examiner.
But is it possible to have always objective examinations? Sometimes examiners have to observe students in clinical competence during the interview and examination of a patient in the ward. Is it possible to make these observations objective. The answer is Yes as well. We can make these observations as objective as possible if we discuss and define the different competencies in advance with several examiners and make observation categories (rating categories) and decide in advance how to judge the different performances of the student according to these observation categories. (rating system and marking scale).Therefore we must look for techniques which minimalise the subjectivity of the examiner and which bring his judgement into concordance with the judgement of other examiners. In this way the more subjective examinations can be made as objective as possible.
Assessment methods
The most common kind of examination method can be divided into three categories.
Oral
The most important oral examinations are the traditional oral and the Chart Stimulated recall.
The traditional oral examination or viva voce examination has for centuries been the predominant method and sometimes the only method used to assess the students. But we have already stated that many studies show the unreliability of the examiners; indeed what one examiner assessed as positive might well be assessed by another one as negative! But many of teachers like orals because they give a lot of flexibility to raise questions and to test some skills. However, the disadvantages are numerous unless stringent precautions are taken. If an oral is used it is very important to standardise the content if possible, which means that the content of what has to be tested has to be defined in advance and a set of questions has to be prepared. When these questions are written on cards, also the correct answers and markings, the oral becomes more reliable. When possible different examiners can discuss the relevant questions and correct answers to get more agreement among these teachers. But very few examiners like to submit to this necessary discipline, and so, all over the world orals in clinical subjects are unstructured and depend far too much on the mood and interest of the examiner.
A very interesting oral to assess, in particular problem solving skills, is Chart stimulated recall (CSR), which is more reliable. This kind of examination is a discussion between an examiner and a particular health worker about a case (a Patient) that has been is managed by the health worker before in the clinic or in the field. Information about a patient is supplied to the examiner, for example a chart on which the health worker has listed the results of his investigations on the patient. When this information is made available to the examiner the discussion starts with a brief presentation of the case by the health worker. The discussion is subsequently directed by the examiner who may ask questions related to data acquisition, the way the health worker has solved the problem, had managed the patient and so on. The examiner with the help of rating scales can then rate all the answers of the health worker. This method resembles the traditional bedside examination, but with the use of rating categories and ratings scales, the subjectivity of the examiner can be reduced.
For rating categories or checklist, see table 14
For a rating scale see table 15
Table 14. Rating categories: what to observe?
History and examination of the patient.
Was an adequate comprehensive history taken and presented?
Was the physical examination carefully and thorough performed?
Were abnormal signs elicited?
Were these abnormal findings commented and understood?
Were the student's approach and the communication with the patient acceptable?
Has the student assessed the results of relevant tests?
Problem solving
Has the student made a reasonable diagnosis?
Is their evidence of hypothesis testing?
Does the diagnosis fit the available data?
Is there evidence that plausible alternative diagnosis were considered?
Do the results of lab tests help to establish the diagnosis?
Management of the patient
Given the presenting symptoms and diagnosis was the management safe?
Given the presenting symptoms and diagnosis was the management consistent with currently accepted treatment of such case?
Was the management appropriate for the severity of the problems presented?
Was the sequence of the actions appropriate?
Were needed actions omitted?
Were their indications for each action?
Were observations/tests chosen to monitor progress?
Were often members of the health team consulted as contributors tot the management?
Was a plan suggested for any complication(s)?
Understanding of the Pathophysiology
Does the student understand.
The underlying disease process?
Plausible complications and the reasons for their appearance.
Progression, timing, and prognosis in the natural history of the disease/disorder? Mechanisms of action and possible side effects of relevant drugs?
Conditions affecting the success and the probable effects of relevant procedures?
Written
The most important written examination techniques are
Essay
Short answers
Simulation of initial problem solving (SIMP)
Modified essay questions
Written objective tests
Essay
For a long time the writing of an essay was very popular in schools and training courses. Teachers were convinced that they could assess knowledge and understanding of medical subjects in an effective way. But in recent years serious reasons have been raised to avoid essay questions for assessment because of the unreliability in the marking of the answers. Teachers disagree upon the answers given by students. It may be traditional to use this kind of examination, but, before it is used, the purposes for which it is to be used must be very carefully considered: it has severe limitations. While it is certainly desirable for a health worker to be trained to write a narrative and clear report, training for this skill of writing, and testing of it, is best not done at a professional final examination. If for some reasons teacher decides to use this format they have to take the following into account.
Note, when clear direct words, like for example compare/contrast, are used, it is highly probable that there are other forms of examination that can cover the subject more efficiently and reliably. One of these is the short answer.
Short answers
Short answers, just like essays, demand a written response by the student, which must be read by the examiner. But this method is very powerful since very short answers must be given which can be marked much more easily than long and extended responses.
For example
An elderly patient presents with several hypopigmented skin patches on the body. The patches were anaesthetic but non-itchy.
What is the likely diagnosis.
ANSWER leprosy (1 mark)
Or
An elderly patient presents with several hypopigmented skin patches on the body. The patches were anaestetic but not itchy. Physical examination revealed multiple enlarged en tender nerves.
What do these signs indicate?
ANSWER. They are signs of reactive state.
In the same way as with the essay questions, the formulation of the questions is important; they must be clear and direct so that a straightforward answer is possible.
Simulation of initial problem solving (SIMP)
This kind of test is a very simple but effective way to assess the competence of the student if he confronted with a patient. He is invited then to indicate what he would do. What are the most important questions in the history and in the physical examination? What laboratory tests would he request.
It starts with for example the next case.
The pregnant woman with high fever.
A woman in the thirty-third week of a normal pregnancy presents with low-grade fever and productive cough.
What would you do?
The student must list, directed by his initial impression, his planned actions.
For this case the following rating categories (checklist) could be used
History
Physical examination
Diagnoses (preliminary hypotheses)
Further tests/investigations
Modified essay Questions (MEQ)
In a MEQ the trainee is provided with a short description of a patient with a limited amount of data and is then asked to write a brief answer to the question. After this first answer, more questions are presented so this format resembles a series of short answer questions. This assessment method allows the examiner to see the way the trainee deals with a patient over time, and is a most valuable method.
Example MEQ. We take the example of the woman with fever again.
A pregnant woman in the thirty third week of a normal pregnancy presents with low-grade fever and productive cough. The direct sputum examination revealed positive AFB.
In a MEQ it is possible to raise specific questions to see how a student deals with this patient over time. For example;
1. What do you do next/Describe what you would recommend?
2. What treatment would you recommend given that she had not taken any drug since the illness started?
After the student has given the answers, the following information is supplied to the student.
The husband of the woman has travelled to a near by town and he is expected back until 4 weeks time. In addition she has two young children at home and no house help.
Question 3. Bearing in mind that the policy of the control programme is that all diagnosed patients must be hospitalised during initial phase of treatment, what would you do?
Good MEQ's and SIMP's are not so difficult to prepare: they take some time but they are a very useful training for teachers to discuss the most appropriate approach of patients and problems. To make sure that the examiner marks the answers reliably, the examiners must all agree what answers are acceptable in the examination.
Written objective tests.
This term is used for some tests like Multiple choice Questions (MCQ) and True/false questions in which the marking of the answers is objective. A typical MCQ has a stem and four of five possible answers. A True/false questions presents a statement and the student has to decide whether the statement is true or false.
An example of a MCQ is
stem: Active immunisation is available against all of the following diseases except
five possible answers(one correct)
or
The leprosy bacillus was discovered by
An example of a True/false question
statement: A TB patient on treatment who becomes positive at the 5th month should be registered as a failure case..
True/False.
Or
A leprosy patient with 5 skin lesions and an enlargement of the ulnar and radial cutaneous nerves should be classified as Puacibacillary.
True/false
Or
Clinical diagnosis of leprosy by supervisors in the field can be made according to the following criteria:
T/F Number of skin lesions
T/F Number of enlarged nerve trunks
T/F Clinical features of the skin lesions
T/F The morphological index
T/F The slit skin smear result
MCQ's can be used to test a wide base of knowledge, to interpret data and to test reasoning in a clinical problem. Visual aids can be presented with these questions, especially in leprosy and related dermatological problems, a picture can be used as the basis for the stem, and by means of the questions the student can be tested on the recognition of the picture and its significance. Machines, computers or administrators (objective) can do marking of these questions and the correct answers. No examiners time is needed, but questions and the correct answers have been agreed before the examination. However with MCQ's this can be a lengthy process, because the stem of many MCQ is ambiguous and so they have to be rejected, while other questions can have mutually exclusive answers that cannot be used.
True/False is a type of question that is meant to investigate whether the student knows or does not know. With this sort of question the student can be tested across a wide range of knowledge.
In the same way as in essay and short notes the examiner has to make sure that the statements are short and unambiguous, and he must ensure that the statement is unequivocally true or false. Again, this must be agreed beforehand by the team of examiners.
Machines or administrators can also do the marking of these questions.
In literature one can read elaborate discussions what should be the best format; True/False or MCQ, and about MCQ how many distracters there should be 4 or 5. And how many answers could be correct, only one or at least one or none of the answers. To avoid difficult discussions the examiner can use a rule of thumb. Many Questions are always better than only a few questions, because the content of the subjects is covered more comprehensively. True/False questions are easier to construct than MCQ's, and simple MCQ's , with only four alternatives and only one correct answer are easier to construct than the more complicated MCQ's. Make sure to have enough questions to test the knowledge of the student and that the questions fairly represent the knowledge that must be known. But if MCQ's are to be used, we prefer those that have a variable number of correct answers. This discourages guessing.
Direct observation
Examinations must also focus on all kind of practical skills. This can be done by observing the performance of the student in practice or in a simulated (role-play) situation. All kind of skills can be observed. Practical skills, as for example how nerves are examined, or communicative skills which determine how a candidate talks to a patient, whether to obtain information in a history, or to explain what's wrong. When an examiner observes a candidate no questions are asked so that the candidate can be allowed to carry out the examination without hindrance. But the different observations by examiners in this kind of practical examination can also be unreliable because different examiners see different things and judge these ,or even the same things, in a different way. Thus again there is the problem of the examiner; what does he see and how does he interpret what he sees?
Let's spend a few words on the problems that arise when somebody is observing in his personal way. When you observe something you are putting something of yourself into that observation and your description of it. We like to illustrate this with an example in art. A famous example of the different ways in which one can look at things is to compare two great artists, Velazquez and Goya, who were both famous Spanish realist painters, painted members of the Royal Family in a completely different way. With Velazquez, they all became noblemen, because Velazquez himself was a nobleman. But, when Goya he painted the Royal Family, he made them look like a butcher's family in their Sunday best clothes.
In the same way we may expect different vues in medicine since different doctors and health workers have different experience and different views about what is important and correct.
Various ways have been suggested by which these different ways of looking and interpretation can be dealt with., which seek to reduce the big variations between different observers. The most important aids are rating categories (checklists) and rating forms, in the same way as with written and oral examinations.
In constructing a checklist to observe the performance of a student examiners discuss in advance what performances can be expected in a test situation and which of these performances are correct and not correct.
In fact the examiner is checking the kind of performance with the checklist and when he has to judge the quality of the performance he can also use rating scales to mark.
For an example of observing practical skills see checklist below.
Examination of an ulcer of the leg. Rating categories (Checklist) and rating scale.
Objective Structured Practical Examination (OSPE)
The Objective Structured Practical Examination (OSPE), or the Objective Structured Clinical Examination (OSCE) is a way of examining communication skills, manual skills, decision-making skills and knowledge at the end of a course. A well-designed OSCE would test the student's ability in different areas. The distinctive characteristic of the OSCPE is that it consists of at least 10 "stations". Each station focuses on a particular skill that the student must have at the end of the course.
Each student starts the examination at a different station. At each station the student answers a question or does an examination, which maybe practical or written. At the end of a fixed time period (usually 5 minutes) a bell rings and the student moves to the next station. At the end of the examination every student has visited every station. At the practical stations the students may be asked to take a patient's history, examine some part of the patient (a full examination is not possible in the 5 minutes), examine data or photographs or the results of laboratory tests, or use a piece of equipment. At a written station which follows a practical station, there is usually a short answer question (or possibly MCQ) based on the task performed at the practical station. The practical stations have to be observed by an examiner who uses a checklist or rating scale to assess the student's performance.
One of the great advantages of this kind of examination is that students will be tested on a wide range of abilities. A well-designed OSCE will require the students to do things, which they normally have to do in the field as qualified health workers. The test is valid for a lot of intellectual and practical skills. Because the OSCE has at least 10 stations, quite a lot of space is needed. An ideal space would have several different rooms or a large room, which can be divided by screens for privacy of patients and for the other stations. Because the OSCE is different from more traditional examinations it is vital that both teachers and students prepare for the examination. The students must have a practice an OSCE before they are assessed in their final examinations. This is not a waste of time. It is fair and students can learn from it.
It is essential to prepare all materials thoroughly in advance, checklists, marking systems, instructions for students and examiners, and the technical equipment, which must be in working order. And as we discussed above make sure the examiners understand the items on the checklists, that they agree upon them, and know how to use them.
Prepare a master mark sheet to record all the marks of the students on every station.
Example of 14 stations in a OSCE
Stations
How to choose the most appropriate examination method
Sometimes it is difficult to make a decision about the most appropriate examination method. For example what are the best formats to assess the 14 different skills listed in the OSPE described above. And how to make sure that these formats are used in the most objective way in order to reduce the subjectivity of different examiners.
To make a good decision about the most appropriate examination one should know something about educational measurement.
For the decision " What kind of examination is the most appropriate to test whether a desired competence is mastered" the examiner has to consider 3 important requirements:
To illustrate these requirements in practice we can give different examples. The first one is about the measurement of blood pressure. If two doctors were to measure blood pressure in patients and one instructed the patient to lie down, while the other asked the patient to sit, we might expect to have two different readings of blood pressure ( not reliable, not precise). Similarly if an adult blood pressure cuff were used on a child, the reading would not be an accurate representation of true blood pressure. Consider another concrete example: suppose a manufacturer was producing blood pressure instruments, which were not well calibrated. They all have the same fault and so when they are used by different physicians a reading 15-mm Hg too low is recorded for every blood pressure. The blood pressures are recorded reliable but they are all incorrect, they are all inaccurate, and so they are not valid.
The second example is the scoring or marking by 2 or more different examiners of the answers provided by the student in respect of the pregnant woman.(See SIMP) without rating categories and a rating scale. If some examiners then give high marks while others low then this procedure is nor very reliable.
The third example is about the question if X-rays indicate whether a patient has tuberculosis in the lungs. If 20 X rays are taken of the same patient, and are shown to 5 observers trained to recognise tuberculosis lesions, and all 5 independently report that all the X-rays indicate a lesion, then you may say that the method is reliable. If the 5 observers agree that 10 of the X-rays show a lesion, but 10 do not, then we must conclude that the method is not reliable. Or in a different situation we show 20 X-rays of several patients to these 5 observers. If there is good agreement among them, I may equally conclude that the method is reliable.
With the example of the X-rays we can introduce now the concept of validity. Is a lesion a valid indicator of TB? The answer is No. In fact, the chest X-ray is a reliable but not very valid for diagnosing TB, even if it does reveal a lesion. A lesion is often but not always a proof for TB. To proof TB we need the evidence of the tubercle bacillus in the sputum. This method is reliable and valid.
Similarly, if an examiner wants to know whether a student can elicit enlargement of a peripherical nerve in a leprosy patient, written examinations are not valid. With the latter you can test if a students knows how to do this, but to check whether the student is really able to do so we need a practical test with a real patient, and the use of rating categories.
Much research has been done on the problems of reliability and validity, which have used sophisticated designs and statistical analysis. But it is not necessary to study this in detail: it is enough to use the conclusions The difference between reliability and validity can be easily seen and understood visually, if we consider a gunman firing at a target: These results can show 3 different patterns. See figure 2.
Figure 2.
Click the figure to enlarge
Neither Reliable nor Valid
The first pattern (...A) shows the gunman who hits the target as if he was throwing a dice. Every shot is random. He never hits the bull's eye! They are bad shots. Suppose different teachers used the same test (compare with the same gun) to test the competence of a candidate and got such results, each widely different from the others. Obviously such a test is unreliable. If the results were reliable (had precision), different examiners would have consistent results from the candidate, not just once, but repeatedly.
Reliable but not valid
Look now at figure B: This is different. In this case the gunman's shots are precise and consistent but now the gunman cannot hit the bull's eye. Now the gunman's shots are reliable but they cannot hit the target in the place he wants to hit it. Their shots are precise but not valid or accurate. Validity is answering the question "To what extent a testing measure actually measures what it is intended to measure?" One has to hit the bull's eye: one has to read the correct, real blood pressure. One has to make sure that in the sputum the tubercle bacillus is existent. X rays of the chest are reliable for seeing lesions but not always valid for the proof of TB.
Validity and Reliability
When the gunman hits the target (validity) with all his shots consistently (reliability) figure C is the result. Or to use the blood pressure example: different physicians record the same blood pressure consistently (reliability) but also correctly (well-calibrated) blood pressure (validity). When you use an examination for the assessment of a specific skill, for example the examination of an ulcer of the leg, and the examiners agreed upon the rating categories and the level of performance that is demanded, then this examination can be marked reliable and valid.
To get valid and reliable results in education a logical sequence has to be followed.
In this special case, since examiners are the measuring instruments, reliability is often called objectivity. Teachers examining skills are reliable or objective, when they do not have personal preferences.
Practicability
Finally, is the examination practicable? Many factors have to be considered if a fair test of agreed goals is to be designed.
We all know how difficult it may be to achieve something new. Good ideas fail because of the financial cost, the demands on time, or the number of available people. The teacher must also take these factors into account. On one side he is responsible for the test criteria, (validity and reliability), on the other hand his decisions are also influenced by very practical criteria like budget, manpower, time, colleagues.
OSPE Reconsidered
While we have suggested the important and distinguishing features of a good examination, the best way for examiners to grasp these features is to design their own examination. As an example we have analysed the 3 requirements, reliability R, validity (V)and practicability (P) for the suggested OSPE we described before.
For each competence, which is assessed in one of the 14 stations, we will suggested an examination procedure and estimate the reliability, the validity and the practicability. (++;+, +-;-, --).
P.M. The stations with observation must have an examiner around to observe the rating categories and to score the performance of the student.