Test Development Quick Start
Introduction
Often, teachers and examiners (in higher education, these two functions are usually combined) need information to help them to design tests. They especially need practice advice. This article seeks to meet this need.
It is a checklist of key points which anyone who needs to design tests needs to have at hand when doing so. The theory behind testing is omitted insofar as possible. You can find the theory in various other resources.
It should be underlined that the contents of this article are principally concerned with summative written exams (tentamens), nothing more and nothing less. It is certainly possible that the article may also be relevant for other kinds of assessments, but this was not the aim of the authors.
The article is arranged chronologically. The sections correspond to the different steps in the process of constructing a test.
This article is based on the following article: Van Berkel, H. J. M., & Draaijer, S. (2011, maart). Gids voor toetsontwikkeling [Practical guide for Testdevelopment]. EXAMENS, Tijdschrift Voor de Toetspraktijk, 8(1), 8.
1. Create a Test blueprint, choose a question type and the duration of the test
A Test blueprint is made up of the main topics of the course which you want to ask questions about and, secondly, the cognitive level at which you wish to target with the questions.
1.1 Subject and number of questions
Indicate the main topics from the course; usually around fifteen will be enough.
The number of questions is important for the content validity of the test. An appropriate distribution of questions across the cells in the specification table will ensure that the test is valid in terms of content.
The number of questions is also important for the reliability of the test (the more questions, the better).
The more important you think the topic is, the more questions you should take up in the test.
Indicate in the cells the number of questions that you want to ask about the subject at the relevant level.
Make sure that at least one question is asked on every topic.
The number of questions that can be asked depends on the form and duration of the test. For a discussion on this topic we refer to Combining closed and open questions in a test and Rule of thumb: 40 questions in a 4-choice multiple-choice test – a short follow-up…
1.2 Open Questions
1.3 Closed Questions
Extra points to consider - Some misconceptions and tips
It is a common misconception that closed questions can only be used to test factual knowledge.
It is a common misconception that questions about factual knowledge automatically result in higher scores and questions about application or understanding result in lower scores.
It is advisable, but not absolutely necessary, to use the same kind of question throughout a test to not unnecessary induce a higher cognitive load.
Long open questions and essay questions will take you much longer to mark. It has also been proven that when marking open questions, there is a greater chance that the reliability (and hence fairness) of the assessment will be lower.
Closed questions are more effective than open questions when over 50 students take the test.
Long, open questions are time-consuming to answer which will mean that fewer topics can be included in the test. This can quickly compromise the validity of the test.
2. Construct questions
2.1 Finding inspiration
Sources of inspiration for questions:
The objectives of the problems / exercises in text books, syllabuses, etc.
Main points from the literature and lectures.
The future profession.
Skills acquired and practical exercises during the course.
Discussions during lectures, seminars etc.
Questions from tests that have been used before (particularly questions that are not too difficult or easy and make a clear distinction between the students who have mastered the material and those who have not).
Graphs, tables, diagrams and other images of relevant features or processes can be used to develop some questions.
Verbs
If you want to follow the usual sequence of knowledge/reproduction - application - insight, you can use a particular set of verbs to ask your question.
Knowledge - Reproduction
Name, Describe, Quote, Define, , Identify, Distinguish, Sum up, Paraphrase, Summarize, Estimate, Select, Explain, Translate, Explain in your own words
Application
Calculate, Demonstrate, Use, Make, Develop, Solve, Organize, Produce, Relate, Transfer, Change, Prepare, Change, Extrapolate, Interpret
Insight
Criticize, Categorize, Compose, Conclude, Contrast, Deduce, Formulate, Rewrite, Illustrate, Interpret, Create, Differentiate, Support, Design, Justify, Relate, Summarize, Outline, Explain, Validate, Defend, Compare, Value
Question Shells
The following question shells can also give you a good starting point when designing questions, especially the ones that aim to assess a high cognitive level.
Knowledge: Determine the student's knowledge of subjects
What is the best definition of ....?
What is (not) characteristic of ....?
What are the different elements of the problem?
What is the history of the problem?
What categories are there in the problem?
Evaluating critical thinking: Determine whether the student can use the characteristics of facts, procedures, principles or theories.
What is the most effective (or appropriate) way of ....?
What is better (or worse) ....?
What is most effective for ....?
What is the most critical step in a procedure?
If you know that X is true, what must also be true about Y?
What is (not) necessary in a procedure?
What is the significance of the problem?
Critical thinking (predicting): Determine whether the student can deduce implications, consequences etc. on the basis of facts, procedures, principles or theories.
What would happen if ....?
If this happens, what would you do?
On what basis ...., what would you do?
Data ... what is the main reason for ....
Solve a problem (given scenario): Determine whether the student can provide solutions or evaluate solutions on the basis of a given problem.
What is the nature of the problem?
What do you need to solve this problem?
What is a possible solution?
What is the most effective (efficient) solution?
Why is .... the most effective (efficient) solution?
Additional tips
Devise at least 25% more questions than you will ultimately need to include in the test. After the first round of development, there will always be questions that you consider not to be good enough.
Think about setting up an item bank. Read Handboek In 5 stappen naar een itembank met toetsvragen of SURF to get you going.
Think ahead. If you will also have to develop a resit test or tests for subsequent years, develop several variants of the same type of question at the same time. Have a look at systems to support you by incorporating variables in your question set-up.
With e-assessment systems such as TestVision you could even use more types of questions.
2.1 After the construction phase
Discuss the draft questions with others at least once. Discuss the content, the model answer sheet / the correct alternative, and the form.
2.2 Model answer sheets
A model answer sheet is a tool for the person marking the exam or open questions, which serves to increase reliability of the test. It contains the following:
A summary of the ideal answer in key words and the marks available for them.
The procedure to be followed if the answer is not included in the list of keywords.
What to do with grammatical mistakes? Ignore / deduct points / set a minimum quality requirement in order for a mark to be awarded?
For long-answer questions: content, technical design and argumentation and scoring sections separately.
2.3 Checklists
Use the following checklist after you have devised the questions
Contents
Is the selected form of testing (open or closed) the most appropriate?
Are there sufficient questions in the test?
Does the question include only one clear problem?
Is the question free of subjective statements?
Are quotes in the question given a context?
Is the question free of unnecessary/superfluous information?
Have you made sure that the question does not depend unnecessarily on one detail?
Have you made sure that the question is not a trick question?
Does the question include enough information to give an answer?
Is the question grammatically correct?
Is the question free of complex sentence structures?
Can the wording of the question be interpreted in only one way?
Are all negative words, such as not, underlined or written in italics?
Is the question free of double negatives?
Is the question free of words like always, never, usually, certainly?
Have conventions regarding spelling, the use of symbols, punctuation, etc. been respected?
Is the question subdivided into a data section and a question section?
Form
Is the question free of vague terms?
Is the question phrased positively where possible?
Process
Has a blueprint been made?
Has there been a discussion with colleagues from the same field?
Specific for Open Questions
Does the question give enough information about the form that the answer should take? And how long it should be?
Is it clear whether the student needs to explain his/her answer?
Has a model answer sheet been made?
Specific for Closed Questions
For questions based on a statement, is the statement 100% correct or 100% incorrect?
For questions based on a statement, is there only one concept?
Is there any overlap between the alternatives given? (answers which are subsets of one another cause a great deal of confusion)
Are all the alternatives of about equal length?
Are the alternatives in ascending / alphabetical order?
Are all the alternatives plausible?
Is the option "none of the above" or "above all" really necessary?
3. Assembling and administering the test
3.1 Assembling the test
A test is more than just a collection of questions.
Assemble the test from the approved questions using the specification table.
Cluster the questions by subject in the same order in which they were taught during the course.
To prevent cheating, make at least two versions of the test, changing the order of questions. In e-Assessment sytems such as VU's TestVision, consider using randomisation.
3.2 Administering the test
The test instructions for students for pen-and-paper tests must include the following information:
The duration of the test
Closed questions: instructions for completing the questions (e.g. draw a dash)
The weighting of the questions (for open questions, the weighting may be different for each question; for closed questions, each question has the same weighting in principle)
Pass/fail mark
Publishing the results: time and place
Rules concerning the use of additional resources: what is allowed / not allowed?
A ban on the use of mobile phones
The number of pages and the number of questions in the test; students are responsible for checking for any missing pages or printing errors
A reference to the regulations on academic misconduct
What the students should do if they have questions, for example raise hand
Regulations on the use of the toilet
Instructions for giving in the test
Opportunity to make comments
4. Analysing and reviewing the test
Open questions
Use a model answer sheet written in advance.
Mark each question in all the tests, rather than marking each student's test in one go.
Shuffle the order of the tests occasionally.
In practice, it will often be impossible to have all the questions marked by two different assessors, but this should be done at least for students around the pass / fail mark. Take an average of both assessments, if necessary.
Only round off the marks at the end.
You can easily make a test analysis tool and an item analysis tool in Excel, even for open questions.
Collect the students' comments, put them next to the relevant test analysis and discuss this with the question constructor.
Make a decision about the model answers: maintain, amend or supplement?
Where circumstances warrant, look at the questions again using a new model answer sheet.
Closed questions
Give the forms to your test and item analysis service, who will check them for you. For digital tests, the question analysis and test analysis can be carried out directly by a service bureau.
This service carries out the initial analysis.
Collect the students' comments and put them alongside the test and item analysis and discuss the questions with the person who constructed the questions.
Take a decision on each question: keep, delete or make changes in the key? Remember that in general this change can only result in small adjustments.
Have the analysis carried out again.
5. Assigning grades
There are many methods of assigning grades which can be put into three categories: absolute, relative and compromise.
Absolute
Under the absolute method, you determine the pass mark yourself in advance. The point of the pass mark is to determine whether the students have met the requirements of the course which, in turn, are derived from the learning objectives of the course.
Relative
The relative method is based on the idea that the test should be geared to the level of the students who are entitled to attend classes, the majority of whom should in principle be capable of passing the test. Because it is not known in advance what these students are capable of on the basis of the classes that have been given and the test that has been set, the standard cannot be determined in advance. The results of the test must be known before the pass mark can be set.
Compromise
The compromise method tries to bridge the fundamental differences between absolute and relative standards and is preferred in educational practice. The compromise method is generally based on an absolute standard, but it also identifies circumstances under which exceptions can be made to the absolute standard. One compromise method is described below.
Open questions
1. Begin by defining the pass mark. It is not unusual for this to be set at 55% or 60% of the maximum achievable score.
2. If you find that too many students fail your test, it is possible to make the following adjustments to the procedure (after consultation with the Examination Board).
a. Set the limit at 55% of the maximum score instead of 60%, and/or
b. You do not take the maximum score as a starting point, but an average of the five highest scores.
3. You now have the raw score that corresponds to a 0 and the raw score that corresponds to a 10: you can determine the remaining grades by dividing the raw scores into 10 equal units, each of which will correspond to a particular grade.
Closed questions
1. Begin by defining the pass mark. This is usually set at 60% of the highest mark, taking into account the random answer score.
- The random blind guess score is the number of valid questions in the test divided by the number of alternatives given per question. For example, the random answer score for 105 true / false questions is 105 / 2 = 52.5.
- In this example, if the highest score is 96 points, the pass mark will be 60% of the score between 52.5 and 96 = 78.6.
2. Students with the highest score (96) will be awarded a 10; those who score the random answer score will be awarded a 0. The other grades can be calculated using linear regression.
3. If you find that too many students fail your test, the following adjustments to the procedure are possible (after consultation with the Examination Board).
a. You set the limit at 55% of the maximum score instead of 60%, and/or
b. You do not take the maximum score as the starting point, but an average of the five highest scores.
6. Evaluating your test
Additional analysis can be carried out at your request. You can consult the test and item analysis service about this. The report includes quality indicators that relate to the test as a whole and to individual questions. Explanations are provided.
Studying the report carefully will yield information that will enable you to make improvements to next year's test:
Look at the reliability of the test. If this is lower than 0.70, it is advisable to add more questions to next year's test.
Study the very difficult and very easy questions. Look for the reasons for this so that next time you can avoid such questions.
Examine the psychometrically very good questions (i.e. those questions that make a clear distinction between competent and not competent students). This can be seen in the 'Rit-value' of the question. Such questions are worth retaining for the next test.
Additional Resources
Van Berkel, H. J. M., Bax, A., & Joosten-ten Brinke, D. (2014). Toetsen in het Hoger Onderwijs. (3rd ed.). Bohn Stafleu Van Loghum.
Shrock, S. A., & Coscarelli, W. C. (2008). Criterion-referenced test development: Technical and legal guidelines for corporate training. John Wiley & Sons.
Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Lawrence Erlbaum Associates.
Download this article
A PDF document of this article can viewed and downloade below