TEACHING ENGLISH IN GLOBAL CONTEXTS - Chapter 46 Introduction to Language Assessment

Chapter 46 - Introduction to Language Assessment

Natalie A. Kuhlman

DOWNLOAD CHAPTER 46 PDF

ABSTRACT

You assess things every day. Each time you ask a question, you are actually assessing (collecting information). As a teacher you assess language to see if your students are learning and if you are teaching well. You use large scale standardized assessments for moving students from Level 1 to Level 2 and for comparing students, say from Paraguay, with students in other countries. These tests, however, don’t necessarily help you, the classroom teacher, to know if your students have learned and can use what you are teaching. In this chapter, you will learn about assessment and how assessment fits into your curriculum. You will also learn to know where you are going (e.g., objectives, goals, standards) in order to know when you get there (assessment, evaluation).

Keywords: language assessment, curriculum, objectives, goals, standards, evaluation

How to cite this chapter:

Kuhlman, N. (2023). Introduction to Language Assessment. In V. Canese & S. Spezzini (Eds.), Teaching English in Global Contexts, Language, Learners and Learning (pp. 564-572). Editorial Facultad de Filosofía, UNA. https://doi.org/10.47133/tegc_ch46

INTRODUCTION

As a teacher, you need to be able to determine the progress of your students as they learn English. As part of your instructional design, you need to know about your students’ formative growth (short term) and summative growth (long term). You also need to know the difference between traditional assessments (e.g., multiple-choice) and more authentic performance-based assessments (e.g., writing samples, projects, communicative activities). Together with instruction, these assessment concepts form the foundation of classroom teaching.

BACKGROUND

Language assessment has been around for a very long time. The Chinese assessed Mandarin as early as the 1500s. Such assessments were based on Confucian texts and were meant to exclude people from learning Mandarin, not to include them (O’Sullivan, 2012). On the other hand, the first Cambridge Proficiency exam for English was offered in 1913 to include British colonists who wished to enter the British educational system. These purposes have continued throughout history. Many of these early tests were based on prescriptive approaches, such as the number of grammar errors made, rather than whether a person could write a cohesive paper or discuss a topic in the language.

Since the late 20th century, assessment has moved from objective (right or wrong answers on a single test) to a more progressive, performance-based model. This model may include several different items to show how students are progressing, and these items are often combined into a portfolio. Rather than each item being graded individually, the whole collection may be reviewed as pieces of the puzzle and as the completed puzzle.

MAJOR DIMENSIONS

Purposes of Assessment

The main purpose of assessment is to determine how much your students have learned at one point in time or over time. Another purpose of assessment is to inform your teaching. However, what does it mean to have learned English or to be proficient?

Proficiency is how much language is needed for whatever purposes you need to know it. You can be proficient in oral language (listening/speaking) or written language (reading/writing) or in social language or academic language. You can have competence with integrating all four skills. Because languages serve many purposes, multiple varieties have evolved such as geographic, ethnic, national, academic, career, and sports. However, no single assessment can inform about language for all of these contexts. Consequently, to determine your students’ progress toward language proficiency, consider using multiple forms of assessment.

The Common European Framework of Reference (CEFR; Council of Europe, 2022), which is used around the world, provides a descriptive way to determine proficiency. It is divided into three levels: Proficient User (C1, C2), Independent User (B1, B2), and Basic User (A1, A2). These levels form the basis for many proficiency tests used internationally.

When teaching English, assess students’ language development

for students to be placed in appropriate levels or to see their progress;
for others (e.g., parents, school administrators, and even countries) to see your students’ achievement; and
for you, as a language teacher, to know what you have or haven’t taught well and, consequently, identify what else needs to be taught.

Every time you ask a question, you are assessing your students. However, when the assessment is formal and meant to compare how well your students compare to students in other schools or countries, what you teach may be limited to what is on the test. In other words, you might end up teaching to the test at the expense of allowing your students to learn other forms of the language. Consequently, the reason for assessing might limit what you teach.

What Do You Need to Know About Assessment?

Regarding assessment, you need to know the issues that affect students when they are being assessed, especially formally. Such issues can be psychological, political, and affective factors, as well as aspects such as timed testing. To use assessments successfully with your students, become knowledgeable about different types of classroom assessment such as performance-based and traditional. Develop a conscious knowledge of English language structure and of first and second language acquisition so that you will not expect more from your students in English than they know in their primary language (Kuhlman, 2006; Valdez Pierce & Tu, 2022).

Find out what your students think about why they are being assessed. For many, this might not be an educational experience but rather just guessing what the teacher wants. Students might see assessment as something done to them rather than for them. For these reasons, many assessments might not actually tell you what your students really know.

Finally, establish an understanding about basic assessment concepts. Among the most important concepts are accountability, norm-referenced and criterion-referenced testing, validity and reliability, and formative and summative assessment.

Accountability

Accountability means being answerable. It is usually seen "to guarantee that students attain expected educational goals or standards” (O’Malley & Valdez Pierce, 1996, p. 3). This can be for a specific classroom or a more global level such as a specific city or country. Accountability is generally reached through standardized tests. Here, standardized means based on consistency (same test and conditions for everyone with same outcome reporting); it does not necessarily mean based on standards. Standardized tests can be norm-referenced or criterion-referenced.

Norm-Referenced Testing

Norm-referenced testing is primarily used for local and national achievement tests and large-scale language proficiency tests (Gottlieb, 2006). The primary reason for using norm-referenced tests is to make statistically valid comparisons for accountability. These tests use a bell curve where half of the students are at or above the average or mean score, and the other half are below that average or mean score, such as is shown in Figure 1.

Figure 1

Representation of a Bell Curve

Criterion-Referenced Testing

Criterion-referenced testing is “an approach to testing in which a given score is interpreted relative to a pre-set goal or objective (the criterion), rather than to the performance of other test-takers” (Bailey, 1998, p. 243). In other words, on criterion-referenced tests, all your students can be successful. The TOEFL is an example of a criterion-referenced test.

Validity and Reliability

Validity simply means that a test measures what it is supposed to measure. The test can be of any type, informal or formal, as well as performance based. There are two common kinds of validity: content validity and consequential validity. In content validity, the objectives of the curriculum (or standards) are aligned to the content of the test. In consequential validity, the assessment has an effect on instruction. In other words, this assessment leads to a consequence such as follow-up instruction.

Reliability means that a test (or other assessment) has consistent results. Generally, reliability is measured by using a test/retest model. The same test or assignment is given approximately two weeks apart to the same students. If reliable, this test will produce approximately the same results each time it is administered. Reliability is often used in the development of standardized tests such as TOEFL.

Formative Assessment and Summative Assessment

Formative assessment is used to monitor, on an ongoing basis, how well you and your students are doing by gathering information and interpreting progress (Elturki, 2020). As an analogy, formative assessment is like a flower that is being planted and nurtured, with ongoing tracking of its growth and corresponding adjustments as needed. Summative assessment can take place yearly for accountability or at the end of a given cycle (e.g., end of a textbook chapter or end of a grading period). To finish the analogy, summative assessment is like a flower that is full grown and is being assessed for height and number of blossoms.

PEDAGOGICAL APPLICATIONS

Now let’s examine two kinds of classroom assessments: traditional assessment and performance-based assessment. Such information gathering can be formal or informal.

Traditional (Objective) Assessments

As a student, you almost certainly took traditional assessments in the form of quizzes or tests (which have been used for a long time). Such assessments are usually called objective because they focus on right or wrong answers (i.e., correct or incorrect). Sometimes there are degrees of correctness such as in cloze items and fill-in-the-blank items. Although considered “objective” assessments, they actually aren’t. The test designer (either an individual or a group) decides which items to put on the assessment and how to identify students’ responses as being either right or wrong. As the teacher, you are usually unaware why your students respond in a certain way. Consider having students write why they choose a certain answer, and this might change your interpretation of their response as being right or wrong. Following are several common types of traditional assessments (Coombe et al., 2012; Kuhlman, 2006)

Multiple-Choice. When conducting a multiple-choice assessment, have students respond to questions by selecting from among several options (usually from three to five per question). Although multiple-choice questions are seen as objective, someone selected the questions and determined the correct answers. In that sense, this type of assessment is somewhat subjective (i.e., based on someone’s opinion). As an example of this subjectiveness, think about when you took a test as a student and disagreed with the correct answer.

True-False. When you conduct a true-false assessment, students respond by selecting “true” or “false” and, therefore, have a 50% chance of selecting the right answer. Words such as “never” and “always” can guide students in making their selection because almost nothing is always right or wrong. As a teacher, you usually do not know why students selected either “true” or “false.” To use true-false questions for better identifying knowledge, consider asking students to explain why they chose either “true” or “false.”

Dictation. When conducting a dictation assessment, read something aloud and have students write exactly what they hear. This is more complicated than it appears. In advance, decide how many times you will read the passage. Consider if you will read it slowly, or at normal speed, or both. Determine whether to read a passage that is unknown to students or one that they have already seen or heard. Decide whether to take spelling into consideration. Use a key to score the dictation and then count the errors.

Cloze and Other Fill-in-the-Blank Assessments. When conducting a cloze assessment or another type of fill-in-the-blank assessment, have students read a sentence or passage and then write the missing word or words (usually every 3 to 7 words depending on the length of the text). Decide whether correct responses are open-ended (several responses are acceptable) or closed (only one response is acceptable). Multiple choice can also be used, especially for low level students who, after becoming more proficient, move to open-ended. Correcting cloze and fill-in-the-blank assessments can be complicated. Does correct mean the exact word, or can it be a synonym? What if the response is something unexpected but, yet, has the same meaning?

Performance-Based Assessment

Contrary to the traditional assessments explained above, performance-based assessments require students to construct an oral and/or written response (O'Malley & Valdez Pierce, 1996). They can provide first an oral response and then a written response, or they do just one or the other. A performance-based assessment can be anything your students perform, produce, or create. This can be an assigned task or an observation (e.g., of student behavior), and it can be formal or informal. Students “accomplish complex and significant tasks, while using prior knowledge, recent learning, and relevant skills to solve realistic or authentic problems” (Herman et al., as cited in O’Malley & Valdez Pierce, 1996, p. 4). For performance-based assessments, students may do oral and/or written projects, class debates, book reports, and writing journals. These can be with or without assigned topics.

As an alternative to grades, which are more common in traditional assessments, rubrics are often used to score performance-based assessments. Rubrics show degrees of learning, rather than just correctness. Usually, different levels will have descriptions of what should be achieved at that level, as in the CEFR (COE, 2022). Rubrics are useful in also providing additional feedback for both you (the teacher) and your students. Rubrics can be used with any type of assessment and especially with portfolios. However, using portfolios is also challenging because there are no right or wrong responses. Even when teachers are trained to use rubrics, subjectivity always plays a role.

Frequently, performance-based assessments are compiled in a portfolio. The contents can vary from daily journals to essays (perhaps first and last draft). This assessment portfolio can include all types of work (including traditional assessments) that show growth over time.

Which One to Use?

As the teacher, select the assessment type that matches your curriculum objectives and activities. For example, a multiple-choice test (traditional) cannot determine how well students write book reports (performance based). Select assessments based on content and form. Differences between traditional assessments and performance-based assessments are summarized in these columns:

Traditional

Knowledge/facts

Objective

Receptive

Discrete sub-skills

Overall mastery

Right/wrong

Performance-based

Language use

Subjective

Productive

Integrated skills

Process and progress

Rubric criteria

Why Should You Know About Assessment?

To be an effective language teacher, you need to know about assessment. You need to identify where you are going (goals, objectives, standards) to know how to get there (curriculum/teaching and learning). You also need to determine where you and your students are along the way as well as when you have reached your destination (assessment and evaluation).

In this chapter, you learned about formative and summative assessments. You learned to differentiate between traditional assessments (e.g., multiple-choice) and performance-based assessments (e.g., actual writing samples, projects, communicative activities) and between norm-referenced and criterion-referenced assessments. You also learned how assessment can inform teaching.

KEY CONCEPTS

Here are some key concepts about assessment:

Assessment and learning go together. You can’t have one without the other.
Assessment is collecting information, and evaluation is making decisions based on the information you have collected.
Formative assessment is used daily, and summative assessment is used upon finishing a chapter, semester, or year.
Traditional assessment uses objective measures (e.g., multiple-choice, true-false), and performance-based assessment requires students to produce or create something.
With norm-referenced assessment, your students are assessed based on norms and judged against each other with 50% below average and 50% above average. With criterion-referenced assessment, everyone can succeed.

DISCUSSING

Based on this chapter about assessment, answer these questions:

1. Is testing a good practice? Why or why not?
2. How do you know your students are learning?
3. In your situation, what are the purposes of assessment? Who decides?
4. Should curriculum dictate assessment or should assessment dictate curriculum? (trick)

TAKING ACTION

To practice what you have learned about assessment, do the following:

1. List several traditional assessments you have given (assigned) as a teacher and/or taken (produced) as a student. Think about what you learned from these assessments (as teacher or student), and what you did with the results.
2. List several performance-based assessments you have given as a teacher and/or taken as a student. Think about what you learned from these assessments, and what you did with the results.
3. Compare your responses regarding traditional and performance-based assessments.

EXPANDING FURTHER

To expand your knowledge about assessment, visit these websites:

Center for Applied Linguistics. www.CAL.org
Common European Framework of Reference. www.coe.int/lang-CEFR
International Association of Teachers of English as a Foreign Language. www.IATEFL.org
TESOL International Association. www.TESOL.org

REFERENCES

Bailey, K. (1998). Learning about language assessment: Dilemmas, decisions, and directions. Heinle Cengage Learning.

Coombe, C., Davidson, P., O’Sullivan, B., & Stoynoff, S. (Eds.). (2012). The Cambridge guide to second language assessment. Cambridge University Press.

Council of Europe (COE). (2022). Common European Framework of Reference. www.coe.int/lang-CEFR

Elturki, E. (2020). A systematic process for assessing assessment. English Teaching Forum, 58(4), 12-21. https://americanenglish.state.gov/files/ae/resource_files/etf_58_4_pg12-21.pdf

Gottlieb, M. (2006). Assessing English language learners: Bridges from language proficiency to academic achievement. Corwin Press.

Kuhlman, N. (2018). An introduction to language assessment in the K-12 classroom [online course]. ELT Advantage Subscription; Cengage Learning. (Original work published 2006, October)

O’Malley, J. M., & Valdez Pierce, L. (1996). Authentic assessment for English language learners: Practical approaches for teachers. Addison-Wesley

O’Sullivan, B. (2012). A brief history of language testing. In C. Coombe, P. Davidson, B. O’Sullivan, & S. Stoynoff (Eds.). The Cambridge guide to second language assessment (pp. 9-19). Cambridge University Press.

Valdez Pierce, L., & Tu, T. (2022). Nationally recognized programs: Setting the standard for assessment literacy [paper presentation]. Annual Convention of the TESOL International Association, virtual

ABOUT THE AUTHOR

Natalie Kuhlman is professor emerita at San Diego State University (USA). Natalie authored An Introduction to Language Assessment in the K-12 Classroom (ELT Advantage, Cengage, 2006) and co-authored Preparing Effective Teachers of English Language Learners (TESOL, 2012) and TESOL EFL Guidelines for Teacher Standards Development (TESOL, 2014). She was a board member of the TESOL International Association and president of California Teachers of English to Speakers of Other Languages (also known as CATESOL). Natalie was recognized as one of 30 ESL Specialists by the U.S. State Department for her work in Albania, Uruguay, Ecuador, and Indonesia with applying TESOL standards to English teacher preparation programs. She has provided ELT workshops in Paraguay, Uruguay, Brazil, and Ecuador among others.

Email for correspondence regarding this chapter: nkuhlman@sdsu.edu

Cover Photo by Surface on Unsplash

Page updated

Report abuse