How Do You Create Authentic Assessments?
By Jon Mueller [Used with permission from the author]
Mueller, J. (2018) Authentic assessment toolbox. Retrieved from http://jfmueller.faculty.noctrl.edu/toolbox/index.htm
Authentic Assessment: Students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills (Mueller, 2018).
Use these links to navigate easily through this chapter:
Mueller (2018) shared that many times, you don't have to "develop an authentic assessment from scratch" (n.p.). Because of the subjects that we teach, you may (and probably are) already using authentic tasks and assessments in your classroom or have experienced them in your own education. Mueller shared that one way to approach creating authentic assessments is to think of the following four questions:
1) What should students know and be able to do?
This list of knowledge and skills becomes your...
STANDARDS
2) What indicates students have met these standards?
To determine if students have met these standards, you will design or select relevant...
AUTHENTIC TASKS
3) What does good performance on this task look like?
To determine if student have performed well on the task, you will identify and look for characteristics of good performance called ...
CRITERIA
4) How well did the students perform?
To discriminate among student performance across criteria, you will create a ...
RUBRIC
(The rubric has a couple of questions to think about such as "How well should most students perform?" and What do students need to improve upon?" These will be addressed in another chapter)
Identify your standards for your students.
For a particular standard or set of standards, develop a task your students could perform that would indicate that they have met these standards.
Identify the characteristics of good performance on that task, the criteria, that, if present in your students’ work, will indicate that they have performed well on the task, i.e., they have met the standards.
For each criterion, identify two or more levels of performance along which students can perform which will sufficiently discriminate among student performance for that criterion. The combination of the criteria and the levels of performance for each criterion will be your rubric for that task (assessment).
Below you will find explanations of each Step.
For any type of assessment, you first must know where you want to end up. What are your goals for your students? An assessment cannot produce valid inferences unless it measures what it is intended to measure. And it cannot measure what it is intended to measure unless the goal(s) has been clearly identified. So, completing the rest of the following steps will be unproductive without clear goals for student learning.
Standards, like goals, are statements of what students should know and be able to do. However, standards are typically more narrow in scope and more amenable to assessment than goals. (Before going further, I would recommend that you read the section on Standards for a fuller description of standards and how they are different from goals and objectives.)
What Do Standards Look Like?
Standards are typically one-sentence statements of what students should know and be able to do at a certain point. Often a standard will begin with a phrase such as "Students will be able to ..." (SWBAT).
For example,
Students will be able to add two-digit numbers.
Or, it might be phrased
Students will add two-digit numbers.
A student will add two-digit numbers.
Or just
Identify the causes and consequences of the Revolutionary War.
Explain the process of photosynthesis.
More examples can be found at Standards examples
Also, read the section on types of standards to see how standards can address course content, or process skills or attitudes towards learning.
How Do You Get Started?
I recommend a three-step process for writing standards:
1. REFLECT
2. REVIEW
3. WRITE
1. REFLECT
As I will discuss below, there are many sources you can turn to to find examples of goals and standards that might be appropriate for your students. There are national and state standards as well as numerous websites such as those above with many good choices. It is unnecessary to start from scratch. However, before you look at the work of others, which can confine your thinking, I would highly recommend that you, as a teacher or school or district, take some time to examine (or REFLECT upon) what you value. What do you really want your students to know and be able to do when they leave your grade or school?
Here is a sample of questions you might ask yourself:
What do you want students to come away with from an education at _______?
What should citizens know and be able to do?
If you are writing standards for a particular discipline, what should citizens know and be able to do related to your discipline?
What goals and standards do you share with other disciplines?
What college preparation should you provide?
Think of a graduate or current student that particularly exemplifies the set of knowledge and skills that will make/has made that student successful in the real world. What knowledge and skills (related and unrelated to your discipline) does that person possess?
Ask yourself, "above all else, we want to graduate students who can/will ........?
When you find yourself complaining about what students can't or don't do, what do you most often identify?
As a result of this reflection, you might reach consensus on a few things you most value and agree should be included in the standards. You might actually write a few standards. Or, you might produce a long list of possible candidates for standards. I do not believe there is a particular product you need to generate as a result of the reflection phase. Rather, you should move on to Step 2 (Review) when you are clear about what is most important for your students to learn.
For example, reflection and conversation with many of the stakeholders for education led the Maryland State Department of Education to identify the Skills for Success it believes are essential for today's citizens. Along with content standards, the high school assessment program in Maryland will evaluate how well students have acquired the ability to learn, think, communicate, use technology and work with others.
2. REVIEW
Did you wake up this morning thinking, "Hey, I'm going to reinvent the wheel today"? No need. There are many, many good models of learning goals and standards available to you. So, before you start putting yours down on paper, REVIEW what others have developed.
For example, you can look at
your state goals and standards
relevant national goals and standards
other state and local standards already created
your existing goals and standards if you have any
other sources that may be relevant (e.g., what employers want, what colleges want)
Look for
descriptions and language that capture what you said you value in Step 1 (REFLECT)
knowledge and skills not captured in the first step -- should they be included?
ways to organize and connect the important knowledge and skills
Look to
develop a good sense of the whole picture of what you want your students to know and to do
identify for which checkpoints (grades) you want to write standards
3. WRITE
The biggest problem I have observed in standards writing among the schools and districts I have worked with is the missing of the forest for the trees. As with many tasks, too often we get bogged down in the details and lose track of the big picture. I cannot emphasize enough how important it is to periodically step back and reflect upon the process. As you write your standards, ask yourself and your colleagues guiding questions such as
So, tell me again, why do we think this is important?
Realistically, are they ever going to have to know this/do this/use this?
How does this knowledge/skill relate to this standard over here?
We don't have a standard about X; is this really more important than X?
Can we really assess this? Should we assess it?
Is this knowledge or skill essential for becoming a productive citizen? How? Why?
Is this knowledge or skill essential for college preparation?
Yes, you may annoy your colleagues with these questions (particularly if you ask them repeatedly as I would advocate), but you will end up with a better set of standards that will last longer and provide a stronger foundation for the steps that follow in the creation of performance assessments.
Having said that, let's get down to the details. I will offer suggestions for writing specific standards by
listing some common guidelines for good standards and
modeling the development of a couple standards much as I would if I were working one-on-one with an educator.
GUIDELINE #1:
For a standard to be amenable to assessment, it must be observable and measurable. For example, a standard such as
"Students will correctly add two-digit numbers"
is observable and measurable. However, a standard such as
"Students will understand how to add two-digit numbers"
is not observable and measurable. You cannot observe understanding directly, but you can observe performance. Thus, standards should include a verb phrase that captures the direct demonstration of what students know and are able to do.
Some bad examples:
Students will develop their persuasive writing skills.
Students will gain an understanding of pinhole cameras.
Rewritten as good examples:
Students will write an effective persuasive essay.
Students will use pinhole cameras to create paper positives and negatives.
GUIDELINE #2:
A standard is typically more narrow than a goal and broader than an objective. (See the section on Standards for a fuller discussion of this distinction.)
Too Broad
Of course, the line between goals and standards and objectives will be fuzzy. There is no easy way to tell where one begins and another one ends. Similarly, some standards will be broader than others. But, generally, a standard is written too broadly if
it cannot be reasonably assessed with just one or two assessments
(for content standards) it covers at least half the subject matter of a course or a semester
For example, the old Illinois Learning Standards for social science (since updated) listed "Understand political systems, with an emphasis on the United States" as a goal. That is a goal addressed throughout an entire course, semester or multiple courses. The goal is broken down into six standards including "Understand election processes and responsibilities of citizens." That standard describes what might typically be taught in one section of a course or one unit. Furthermore, I feel I could adequately capture a student's understanding and application of that standard in one or two assessments. However, I do not believe I could get a full and rich sense of a student's grasp of the entire goal without a greater number and variety of classroom measures. On the other hand, the standard, "understand election processes and responsibilities of citizens," would not typically be taught in just one or two lessons, so it is broader than an objective. Hence, it best fits the category of a standard as that term is commonly used.
Another tendency to avoid that can inflate the breadth of a standard and make it more difficult to assess is the coupling of two or more standards in a single statement. This most commonly occurs with the simple use of the conjunction "and." For example, a statement might read
Students will compare and contrast world political systems and analyze the relationships and tensions between different countries.
Although these two competencies are related, each one stands alone as a distinct standard.
Additionally, a standard should be assessable by one or two measures. Do I always want to assess these abilities together? I could, but it restricts my options and may not always be appropriate. It would be better to create two standards.
Students will compare and contrast world political systems.
Students will analyze the relationships and tensions between different countries.
In contrast, the use of "and" might be more appropriate in the following standard:
Students will find and evaluate information relevant to the topic.
In this case, the two skills are closely related, often intertwined and often assessed together.
Too Narrow
A possible objective falling under the social science standard mentioned above that a lesson or two might be built around would be "students will be able to describe the evolution of the voter registration process in this country." This statement would typically be too narrow for a standard because, again, it addresses a relatively small portion of the content of election processes and citizen responsibilities, and because it could be meaningfully assessed in one essay question on a test.
Of course, you might give the topic more attention in your government course, so what becomes an objective versus a standard can vary. Also, it is important to note that standards written for larger entities such as states or districts tend to be broader in nature than standards written by individual teachers for their classrooms. A U.S. government teacher might identify 5-15 essential ideas and skills for his/her course and voter registration might be one of them.
As you can see, each of these distinctions and labels are judgment calls. It is more important that you apply the labels consistently than that you use a specific label.
Note: You may have noticed that the Illinois Learning Standard that I have been using as an example violates Guideline #1 above -- it uses the verb understand instead of something observable. The Illinois Standards avoids this "problem" in most cases. However, the State addresses it more directly by writing its "benchmark standards" in more observable language. For example, under the general standard "understand election processes and responsibilities of citizens" it states that by early high school (a benchmark) students will be able to "describe the meaning of participatory citizenship (e.g., volunteerism, voting) at all levels of government and society in the United States."
GUIDELINE #3:
A standard should not include mention of the specific task by which students will demonstrate what they know or are able to do.
For example, in a foreign language course students might be asked to
Identify cultural differences and similarities between the student's own culture and the target culture using a Venn diagram.
The statement should have left off the last phrase "using a Venn diagram." Completing a Venn diagram is the task the teacher will use to identify if students meet the standard. How the student demonstrates understanding or application should not be included with what is to be understood or applied. By including the task description in the standard, the educator is restricted to only using that task to measure the standard because that is what the standard requires. But there are obviously other means of assessing the student's ability to compare and contrast cultural features. So, separate the description of the task from the statement of what the student should know or be able to do; do not include a task in a standard.
GUIDELINE #4:
Standards should be written clearly.
GUIDELINE #5:
Standards should be written in language that students and parents can understand.
Share your expectations with all constituencies. Students, parents and the community will feel more involved in the process of education. Standards are not typically written in language that early elementary students can always understand, but the standards (your expectations) can be explained to them.
If you completed Step 1 (identify your standards) successfully, then the remaining three steps, particularly this one, will be much easier. With each step it is helpful to return to your goals and standards for direction. For example, imagine that one of your standards is.
Students will describe the geographic, economic, social and political consequences of the Revolutionary War.
In Step 2, you want to find a way students can demonstrate that they are fully capable of meeting the standard. The language of a well-written standard can spell out what a task should ask students to do to demonstrate their mastery of it. For the above standard it is as simple as saying the task should ask students to describe the geographic, economic, social and political consequences of the Revolutionary War. That might take the form of an analytic paper you assign, a multimedia presentation students develop (individually or collaboratively), a debate they participate in or even an essay question on a test.
"Are those all authentic tasks?"
Yes, because each one a) asks students to construct their own responses and b) replicates meaningful tasks found in the real world.
"Even an essay question on a test? I thought the idea of Authentic Assessment was to get away from tests."
First, authentic assessment does not compete with traditional assessments like tests. Rather, they complement each other. Each typically serves different assessment needs, so a combination of the two is often appropriate. Second, essay questions are constructed-response items and fall on the border of authentic and traditional assessments. (Read more about Authentic Tasks.) That is, in response to a prompt, students construct an answer out of old and new knowledge. Since there is no one exact answer to these prompts, students are constructing new knowledge that likely differs slightly or significantly from that constructed by other students. Typically, constructed response prompts are narrowly conceived, delivered at or near the same time a response is expected and are limited in length. However, the fact that students must construct new knowledge means that at least some of their thinking must be revealed. As opposed to selected response items, the teachers gets to look inside the head a little with constructed response answers. Furthermore, explaining or analyzing as one might do in an essay answer replicates a real-world skill one frequently uses. On the other hand, answering a question such as
Which of the following is a geographical consequence of the Revolutionary War?
a.
b.
c.
d.
requires students to select a response, not construct one. And, circling a correct answer is not a significant challenge that workers or citizens commonly face in the real world.
So, yes, it can be that easy to construct an authentic assessment. In fact, you probably recognize that some of your current assessments are authentic or performance-based ones. Moreover, I am guessing that you feel you get a better sense of your students' ability to apply what they have learned through your authentic assessments than from your traditional assessments.
Starting from Scratch?: Look at your Standards
What if you do not currently have an authentic assessment for a particular standard? How do you create one from scratch? Again, start with your standard. What does it ask your students to do? A good authentic task would ask them to demonstrate what the standard expects of students. For example, the standard might state that students will
solve problems involving fractions using addition, subtraction, multiplication and division.
Teachers commonly ask students to do just that -- solve problems involving fractions. That is an authentic task.
Starting from Scratch?: Look at the Real World
But what if you want a more engaging task for your students? A second method of developing an authentic task from scratch is by asking yourself "where would they use these skills in the real world?" For computing with fractions teachers have asked students to follow recipes, order or prepare pizzas, measure and plan the painting or carpeting of a room, etc. Each of these tasks is not just an instructional activity; each can also be an authentic assessment.
See more examples of authentic tasks.
Criteria: Indicators of good performance on a task
In Step 1, you identified what you want your students to know and be able to do. In Step 2, you selected a task (or tasks) students would perform or produce to demonstrate that they have met the standard from Step 1. For Step 3, you want to ask "What does good performance on this task look like?" or "How will I know they have done a good job on this task?" In answering those questions you will be identifying the criteria for good performance on that task. You will use those criteria to evaluate how well students completed the task and, thus, how well they have met the standard or standards.
Examples
Example 1: Here is a standard from the Special Education collection of examples:
The student will conduct banking transactions.
The authentic task this teacher assigned to students to assess the standard was to
make deposits, withdrawals or cash checks at a bank.
To identify the criteria for good performance on this task, the teacher asked herself "what would good performance on this task look like?" She came up with seven essential characteristics for successful completion of the task:
Selects needed form (deposit, withdrawal)
Fills in form with necessary information
Endorses check
Locates open teller
States type of transaction
Counts money to be deposited to teller
Puts money received in wallet
If students meet these criteria then they have performed well on the task and, thus, have met the standard or, at least, provided some evidence of meeting the standard.
Example 2: This comes from the Mathematics collection. There were six standards addressed to some degree by this authentic task. The standards are: Students will be able to
measure quantities using appropriate units, instruments, and methods;
setup and solve proportions;
develop scale models;
estimate amounts and determine levels of accuracy needed;
organize materials;
explain their thought process.
The authentic task used to assess these standards in a geometry class was the following:
Rearrange the Room
You want to rearrange the furniture in some room in your house, but your parents do not think it would be a good idea. To help persuade your parents to rearrange the furniture you are going to make a two dimensional scale model of what the room would ultimately look like.
Procedure:
You first need to measure the dimensions of the floor space in the room you want to rearrange, including the location and dimensions of all doors and windows. You also need to measure the amount of floor space occupied by each item of furniture in the room. These dimensions should all be explicitly listed.
Then use the given proportion to find the scale dimensions of the room and all the items.
Next you will make a scale blueprint of the room labeling where all windows and doors are on poster paper.
You will also make scale drawings of each piece of furniture on a cardboard sheet of paper, and these models need to be cut out.
Then you will arrange the model furniture where you want it on your blueprint, and tape them down.
You will finally write a brief explanation of why you believe the furniture should be arranged the way it is in your model.
Your models and explanations will be posted in the room and the class will vote on which setup is the best.
Finally, the criteria which the teacher identified as indicators of good performance on the Rearrange the Room task were:
accuracy of calculations;
accuracy of measurements on the scale model;
labels on the scale model;
organization of calculations;
neatness of drawings;
clear explanations.
(But how well does a student have to perform on each of these criteria to do well on the task? We will address that question in Step 4: Create the Rubric.)
You may have noticed in the second example that some of the standards and some of the criteria sounded quite similar. For example, one standard said students will be able to develop scale models, and two of the criteria were accurary of measurements on the scale model and labels on the scale model. Is this redundant? No, it means that your criteria are aligned with your standards. You are actually measuring on the task what you said you valued in your standards.
Characteristics of a Good Criterion
So, what does a good criterion (singular of criteria) look like? It should be
a clearly stated;
brief;
observable;
statement of behavior;
written in language students understand.
Additionally, make sure each criterion is distinct. Although the criteria for a single task will understandably be related to one another, there should not be too much overlap between them. Are you really looking for different aspects of performance on the task with the different criteria, or does one criterion simply rephrase another one? For example, the following criteria might be describing the same behavior depending on what you are looking for:
interpret the data
draw a conclusion from the data
Another overlap occurs when one criterion is actually a subset of another criterion. For example, the first criterion below probably subsumes the second:
presenter keeps the audience's attention
presenter makes eye contact with the audience
Like standards, criteria should be shared with students before they begin a task so they know the teacher's expectations and have a clearer sense of what good performance should look like. Some teachers go further and involve the students in identifying appropriate criteria for a task. The teacher might ask the students "What characteristics does a good paper have?" or "What should I see in a good scale model?" or "How will I (or anyone) know you have done a good job on this task?"
How Many Criteria do you Need for a Task?
Of course, I am not going to give you an easy answer to that question because there is not one. But, I can recommend some guidelines.
Limit the number of criteria; keep it to the essential elements of the task. This is a guideline, not a rule. On a major, complex task you might choose to have 50 different attributes you are looking for in a good performance. That's fine. But, generally, assessment will be more feasible and meaningful if you focus on the important characteristics of the task. Typically, you will have fewer than 10 criteria for a task, and many times it might be as few as three or four.
You do not have to assess everything on every task. For example, you might value correct grammar and spelling in all writing assignments, but you do not have to look for those criteria in every assignment. You have made it clear to your students that you expect good grammar and spelling in every piece of writing, but you only check for it in some of them. That way, you are assessing those characteristics in the students' writing and you are sending the message that you value those elements, but you do not take the time of grading them on every assignment.
Smaller, less significant tasks typically require fewer criteria. For short homework or in-class assignments you might only need a quick check on the students' work. Two or three criteria might be sufficient to judge the understanding or application you were after in that task. Less significant tasks require less precision in your assessment than larger, more comprehensive tasks that are designed to assess significant progress toward multiple standards.
Ask.
Ask yourself; you have to apply the criteria. Do they make sense to you? Can you distinguish one from another? Can you envision examples of each? Are they all worth assessing?
Ask your students. Do they make sense to them? Do they understand their relationship to the task? Do they know how they would use the criteria to begin their work? To check their work?
Ask your colleagues. Ask those who give similar assignments. Ask others who are unfamiliar with the subject matter to get a different perspective if you like.
If you have assigned a certain task before, review previous student work. Do these criteria capture the elements of what you considered good work? Are you missing anything essential?
Time for a Quiz!
Do you think you could write a good criterion now? Do you think you would know a good one when you saw one? Let's give you a couple small tasks:
Task 1: Write three criteria for a good employee at a fast-food restaurant. (There would likely be more than three, but as a simple check I do not need to ask for more than three. Assessments should be meaningful and manageable!)
Task 2: I have written three criteria for a good employee below. I intentionally wrote two clear criteria (I hope) and one vague one. Can you find the vague one among the three? Are the other two good criteria? (Yes, I wrote them so of course I think they are good criteria. But I will let you challenge my authority just this once :-)
the employee is courteous
the employee arrives on time
the employee follows the sanitary guidelines
What do you think? In my opinion, the first criterion is vague and the latter two are good criteria. Of course, evaluating criteria is a subjective process, particularly for those you wrote yourself. So, before I explain my rationale I would reiterate the advice above of checking your criteria with others to get another opinion.
To me, the statement "the employee is courteous" is too vague. Courteous could mean a lot of different things and could mean very different things to different people. I would think the employer would want to define the behavior more specifically and with more clearly observable language. For example, an employer might prefer:
the employee greets customers in a friendly manner
That is a more observable statement, but is that all there is to being courteous? It depends on what you want. If that is what the employer means by courteous then that is sufficient. Or, the employer might prefer:
the employee greets customers in a friendly manner and promptly and pleasantly responds to their requests
"Is that one or two criteria?" It depends on how detailed you want to be. If the employer wants a more detailed set of criteria he/she can spell out each behavior as a separate criterion. Or, he/she might want to keep "courteous" as a single characteristic to look for but define it as two behaviors in the criterion. There is a great deal of flexibility in the number and specificity of criteria. There are few hard and fast rules in any aspect of assessment development. You need to make sure the assessment fits your needs. An employer who wants a quick and dirty check on behavior will create a much different set of criteria than one who wants a detailed record.
The second criterion above, the employee arrives on time, is sufficiently clear. It cannot obviously name a specific time for arriving because that will change. But if the employer has identified the specific time that an employee should arrive then "arrive on time" is very clear. Similarly, if the employer has made clear the sanitary guidelines, then it should be clear to the employees what it means to "follow the guidelines."
"Could I include some of that additional detail in my criteria or would it be too wordy?" That is up to you. However, criteria are more communicable and manageable if they are brief. The employer could include some of the definition of courteous in the criterion statement such as
the employee is courteous (i.e., the employee greets customers in a friendly manner and promptly and pleasantly responds to their requests)
However, it is easier to state the criterion as "the employee is courteous" while explaining to the employees exactly what behaviors that entails. Whenever the employer wants to talk about this criterion with his/her employees he can do it more simply with this brief statement. We will also see how rubrics are more manageable (coming up in Step 4) if the criteria are brief.
"Can I have sub-criteria in which I break a criterion into several parts and assess each part separately?" Yes, although that might be a matter of semantics. Each "sub-criterion" could be called a separate criterion. But I will talk about how to handle that in the next section "Step 4: Create the Rubric."
In Step 1 of creating an authentic assessment, you identified what you wanted your students to know and be able to do -- your standards.
In Step 2, you asked how students could demonstrate that they had met your standards. As a result, you developed authentic tasks they could perform.
In Step 3, you identified the characteristics of good performance on the authentic task -- the criteria.
Now, in Step 4, you will finish creating the authentic assessment by constructing a rubric to measure student performance on the task. To build the rubric, you will begin with the set of criteria you identified in Step 3. As mentioned before, keep the number of criteria manageable. You do not have to look for everything on every assessment.
Once you have identified the criteria you want to look for as indicators of good performance, you next decide whether to consider the criteria analytically or holistically.
Creating an Analytic Rubric
In an analytic rubric, performance is judged separately for each criterion. Teachers assess how well students meet a criterion on a task, distinguishing between work that effectively meets the criterion and work that does not meet it.
The next step in creating a rubric, then, is deciding how fine such a distinction should be made for each criterion. For example, if you are judging the amount of eye contact a presenter made with his/her audience that judgment could be as simple as did or did not make eye contact (two levels of performance), never, sometimes or always made eye contact (three levels), or never, rarely, sometimes, usually, or always made eye contact (five levels).
Generally, it is better to start small with fewer levels because it is usually harder to make more fine distinctions. For eye contact, I might begin with three levels such as never, sometimes and usually. Then if, in applying the rubric, I found that some students seemed to fall in between never and sometimes, and never or sometimes did not adequately describe the students' performance, I could add a fourth (e.g., rarely) and, possibly, a fifth level to the rubric.
In other words, there is some trial and error that must go on to arrive at the most appropriate number of levels for a criterion. (See the Rubric Workshop below to see more detailed decision-making involved in selecting levels of performance for a sample rubric.)
Do I need to have the same number of levels of performance for each criterion within a rubric?
No. You could have five levels of performance for three criteria in a rubric, three levels for two other criteria, and four levels for another criterion, all within the same rubric. Rubrics are very flexible Alaskan Moose. There is no need to force an unnatural judgment of performance just to maintain standardization within the rubric. If one criterion is a simple either/or judgment and another criterion requires finer distinctions, then the rubric can reflect that variation.
Here are some examples of rubrics with varying levels of performance...
Do I need to add descriptors to each level of performance?
No. Descriptors are recommended but not required in a rubric. [D]escriptors are the characteristics of behavior associated with specific levels of performance for specific criteria. For example, in the following portion of an elementary science rubric, the criteria are,
observations are thorough,
predictions are reasonable, and
conclusions are based on observations.
Labels (limited, acceptable, proficient) for the different levels of performance are also included. Under each label, for each criterion, a descriptor (in brown) is included to further explain what performance at that level looks like.
As you can imagine, students will be more certain what is expected to reach each level of performance on the rubric if descriptors are provided. Furthermore, the more detail a teacher provides about what good performance looks like on a task the better a student can approach the task.
Teachers benefit as well when descriptors are included. A teacher is likely to be more objective and consistent when applying a descriptor such as "most observations are clear and detailed" than when applying a simple label such as "acceptable."
Similarly, if more than one teacher is using the same rubric, the specificity of the descriptors increases the chances that multiple teachers will apply the rubric in a similar manner. When a rubric is applied more consistently and objectively it will lead to greater reliability and validity in the results.
Assigning point values to performance on each criterion
As mentioned above, rubrics are very flexible tools. Just as the number of levels of performance can vary from criterion to criterion in an analytic rubric, points or value can be assigned to the rubric in a myriad of ways.
For example, a teacher who creates a rubric might decide that certain criteria are more important to the overall performance on the task than other criteria. So, one or more criteria can be weighted more heavily when scoring the performance. For example, in a rubric for solo auditions, a teacher might consider five criteria: (how well students demonstrate) vocal tone, vocal technique, rhythm, diction and musicality.
For this teacher, musicality might be the most important quality that she has stressed and is looking for in the audition. She might consider vocal technique to be less important than musicality but more important than the other criteria.So, she might give musicality and vocal technique more weight in her rubric. She can assign weights in different ways. Here is one common format:
In this case, placement in the 4-point level for vocal tone would earn the student four points for that criterion. But placement in the 4-point box for vocal technique would earn the student 8 points, and placement in the 4-point box for musicality would earn the student 12 points. The same weighting could also be displayed as follows:
In both examples, musicality is worth three times as many points as vocal tone, rhythm and diction, and vocal technique is worth twice as much as each of those criteria. Pick a format that works for you and/or your students. There is no "correct" format in the layout of rubrics. So, choose one or design one that meets your needs.
Yes, but do I need equal intervals between the point values in a rubric?
No. Say it with me one more time -- rubrics are flexible tools. Shape them to fit your needs, not the other way around. In other words, points should be distributed across the levels of a rubric to best capture the value you assign to each level of performance. For example, points might be awarded on an oral presentation as follows:
In other words, you might decide that at this point in the year you would be pleased if a presenter makes eye contact "sometimes," so you award that level of performance most of the points available. However, "sometimes" would not be as acceptable for level of volume or enthusiasm.
Here are some more examples of rubrics illustrating the flexibility of number of levels and value you assign each level:
In the above rubric, you have decided to measure volume and enthusiasm at two levels -- never or usually -- whereas, you are considering eye contact and accuracy of summary across three levels. That is acceptable if that fits the type of judgments you want to make. Even though there are only two levels for volume and three levels for eye contact, you are awarding the same number of points for a judgment of "usually" for both criteria. However, you could vary that as well:
In this case, you have decided to give less weight to volume and enthusiasm as well as to judge those criteria across fewer levels.
So, do not feel bound by any format constraints when constructing a rubric. The rubric should best capture what you value in performance on the authentic task. The more accurately your rubric captures what you want your students to know and be able to do the more valid the scores will be.
Creating a Holistic Rubric
In a holistic rubric, a judgment of how well someone has performed on a task considers all the criteria together, or holistically, instead of separately as in an analytic rubric. Thus, each level of performance in a holistic rubric reflects behavior across all the criteria. For example, here is a holistic version of the oral presentation rubric above:
An obvious, potential problem with applying the above rubric is that performance often does not fall neatly into categories such as mastery or proficiency. A student might always make eye contact, use appropriate volume regularly, occasionally show enthusiasm and include many errors in the summary. Where you put that student in the holistic rubric? Thus, it is recommended that the use of holistic rubrics be limited to situations when the teacher wants to:
make a quick, holistic judgment that carries little weight in evaluation, or
evaluate performance in which the criteria cannot be easily separated.
Quick, holistic judgments are often made for homework problems or journal assignments. To allow the judgment to be quick and to reduce the problem illustrated in the above rubric of fitting the best category to the performance, the number of criteria should be limited. For example, here is a possible holistic rubric for grading homework problems:
Although this homework problem rubric only has two criteria and three levels of performance, it is not easy to write such a holistic rubric to accurately capture what an evaluator values and to cover all the possible combinations of student performance.
For example, what if a student got all the answers correct on a problem assignment but did not show any work? The rubric covers that: the student would receive a (-) because "little or no work was shown." What if a student showed all the work but only got some of the answers correct? That student would receive a (+) according to the rubric. All such combinations are covered. But does giving a (+) for such work reflect what the teacher values?
The above rubric is designed to give equal weight to correct answers and work shown. If that is not the teacher's intent then the rubric needs to be changed to fit the goals of the teacher.
All of this complexity with just two criteria -- imagine if a third criterion were added to the rubric. So, with holistic rubrics, limit the number of criteria considered, or consider using an analytic rubric.
Final Step: Checking Your Rubric
As a final check on your rubric, you can do any or all of the following before applying it.
Let a colleague review it.
Let your students review it -- is it clear to them?
Check if it aligns or matches up with your standards.
Check if it is manageable.
Consider imaginary student performance on the rubric.
By the last suggestion I mean to imagine that a student had met specific levels of performance on each criterion (for an analytic rubric). Then ask yourself if that performance translates into the score that you think is appropriate. For example, on Rubric 3 above, imagine a student scores
"sometimes" for eye contact (3 pts.)
"always" for volume (4 pts.)
"always" for enthusiasm (4 pts.)
"sometimes" for summary is accurate (4 pts.)
That student would receive a score of 15 points out of a possible 20 points. Does 75% (15 out of 20) capture that performance for you? Perhaps you think a student should not receive that high of a score with only "sometimes" for the summary. You can adjust for that by increasing the weight you assign that criterion. Or, imagine a student apparently put a lot of work into the homework problems but got few of them correct. Do you think that student should receive some credit? Then you would need to adjust the holistic homework problem rubric above. In other words, it can be very helpful to play out a variety of performance combinations before you actually administer the rubric. It helps you see the forest through the trees.
Of course, you will never know if you really have a good rubric until you apply it. So, do not work to perfect the rubric before you administer it. Get it in good shape and then try it. Find out what needs to be modified and make the appropriate changes.
Okay, does that make sense? Are you ready to create a rubric of your own? Well, then come into my workshop and we will build one together. I just need you to wear these safety goggles. Regulations. Thanks.
(For those who might be "tabularly challenged" (i.e., you have trouble making tables in your word processor) or would just like someone else to make the rubric into a tabular format for you, there are websites where you enter the criteria and levels of performance and the site will produce the rubric for you.)
References
Mueller, J. (2018) Authentic assessment toolbox. Retrieved from http://jfmueller.faculty.noctrl.edu/toolbox/index.htm