Michael Kane
Educational Testing Service

Making assessments useful in language education or “fine words butter no parsnips”

Assessments are developed to support score interpretations and uses, and the score-based claims are validated by stating the claims clearly and by evaluating their plausibility. Educational assessment programs can make use of at least three different kinds of interpretations: descriptive interpretations in terms of observable attributes (based on classical and IRT models), explanatory interpretations based on theories of performance, and developmental interpretations in terms of learning progressions (e.g., the CEFR), and can use the scores in a number of ways (e.g., grading, diagnosis, placement). Traditional psychometric analyses tend to focus on descriptive interpretations, but the explanatory and developmental interpretations may be more useful in practice. In any case, it is the interpretations and uses of scores that are validated (rather than the assessments or the scores per se), and it is important that the claims actually based on the scores are the same as those that have been validated.

Georgetown University

Language program evaluation in contemporary language education: Current practices, future directions

An increasingly common requirement for language teachers is the integral use of program evaluation in language educational delivery. To this end, evaluation is meant to be a useful, practical tool that helps instructors, administrators, funders, and communities know or do something that enhances the quality of language instruction. Yet, the reality of evaluation practice—particularly in the present climate of heightened scrutiny and accountability—often fails to be a fully useful endeavor that impacts language education for the better. As such, language research has tried to understand evaluation efficacy and how different approaches can best serve the aims of different language education stakeholders. Furthermore, researchers have periodically looked to mainstream evaluation for insight into the various methodological and contextual factors that make evaluation an effective and meaningfully useful activity. This plenary will likewise canvass current trends in mainstream evaluation practice and research (e.g., organizational learning, evaluation capacity-building, evaluability assessment, logic-modelling, etc.) and discuss their relevance and application in contemporary approaches to language program evaluation and assessment.

Richard Kiely
University of Southampton

Developing students’ self-assessment skills: The role of the teacher

An important role of program evaluation is the facilitation of innovation in classrooms and learning activities. Successful innovations which transform programs and enhance learning, require the support of teachers in terms of a commitment to making the innovation work. In this talk I focus on how teachers innovate, and the implications for program evaluation. I draw on recent research with teachers on the development of student self-assessment skills in language programmes. The issues raised relate to three themes relevant to developing useful program evaluations. First, the complexity of teachers’ practice makes externally motivated innovations difficult. Second, teachers’ evaluations of their teaching are based on personal and largely tacit frameworks, which become apparent through long-term reflective and collaborative professional development. Third, institution-level approaches to quality management and program evaluation are a significant factor for teachers: where they are considered to limit teachers’ autonomy and agency, it is likely they do just that.

Lorena Llosa
New York University

The relationship between language proficiency and content knowledge
in the assessment of English language learners in schools

The relationship between language proficiency and content knowledge in assessment is a complicated one. Traditionally, language has been considered a source of construct irrelevant variance when assessing English language learners’ (ELLs) content knowledge. Similarly, content (or topical knowledge) has also been considered a potential source of construct irrelevant variance in the assessment of ELLs’ language proficiency. Thus, for the purpose of assessment, language proficiency and content knowledge have been viewed as separate and distinct constructs, with “academic language” serving as the bridge between the two. In this talk, I will discuss how evolving views of language and the introduction of new content standards that emphasize language as a critical component of content mastery are forcing us to rethink the language-content link and our construct definitions.

Invited Colloquia

Center for Applied Linguistics 
Uses for and consequences of language proficiency tests for students and teachers 

Overview of Colloquia (CAL)

This symposium will begin with a brief review of argument-based approaches to validity (Kane, 1993; Chappelle et al, 2008; Bachman &Palmer, 2010, Renn & MacGregor, 2014). It will then explore the assessment use argument and its influence on different tests of language proficiency developed for students and teachers. By focusing on the consequences of test results, the symposium will examine how language test developers design tests, items and tasks intended to promote effective teaching practice and accurately reflect student ability. We examine this from the perspective of the test development process and the research that informs that process, from developing test items to examining test performance via operational data. Presenters will include CAL researchers, test developers and psychometricians to provide a holistic picture of both the individual lenses of each part of the test design, development and operationalization process and to inform the consequences of test results from these perspectives.

Meg Malone || Center for Applied Linguistics
Jennifer Renn || Center for Applied Linguistics
Consider the consequences: Applying an argument-based validation framework to assessments with different purposes
Justin Kelly, Jennifer Norton, Michele Kawood || Center for Applied Linguistics
Ensuring content validity: From construct definition to test item
David MacGregor || Center for Applied Linguistics
The role of pre-operational testing: Asking questions and seeking answers
Cary Lin || Center for Applied Linguistics
Psychometric analysis of test performances: Informing operational tests
Dorry Kenyon || Center for Applied Linguistics

King's College London
What's in a name? New constructs in language assessment

Overview of Colloqui (Constant)

The idea of using assessment to promote learning has been receiving increasing attention in recent years in all areas of education. In the field of second/additional language education, there is now a growing body of work that ties assessment, learning and pedagogy closely together in a variety of ways, with the teacher playing an important central role in one way or another. Terms such as ‘Assessment for Learning’, ‘Dynamic Assessment’, ‘Embedded Assessment’, ‘Formative Assessment’, ‘Learning-oriented Assessment’ and ‘Teacher Assessment’ appear in research and professional journals regularly. The common theme that unites these pedagogically-linked assessment approaches is their commitment to promote learning. A key question is: Do these different terms refer to a common underlying concept or do they represent diverse conceptualisations and epistemologies in terms of learning, teaching and assessment? And if these nomenclatural differences do represent significant conceptual and theoretical differences, how do they influence and shape practice? In this colloquium we bring together a team of language assessment specialists from Australia, England, Hong Kong, New Zealand and the USA to address these issues. It is hoped that the discussions will help to build dialogues that can help identify commonalities and differences, with a view to enhancing pedagogic usefulness.

Constant Leung || King’s College London
Chris Davison || University of New South Wales
Assessment for learning: Building on the brand
Martin East || University of Auckland
Embedding assessment for learning into a high-stakes assessment system: Can it really work?
Liz Hamp-Lyons
|| University of Bedfordshire
PEST principles for implementing effective learning-oriented language assessment
Yongcan Liu, Michael Evans
|| Cambridge University
A conceptual framework for the use of Dynamic Assessment with EAL learners in schools in England
Jim Purpura || Columbia University

Marta González-Lloret

University of Hawai'i at Mānoa

Evaluating technology-mediated language education

Overview of Colloquia

As technology-mediation becomes more prominent in all areas of education, there is an increasing need to demonstrate its value and justify the economic and time investment it usually requires. Language education is not an exception. Since the late 1980s (e.g., Chapelle & Jamieson, 1986) the field of CALL has been trying to assess the effectiveness of tools, platforms, approaches, programs, and pedagogical choices, and although we have come a long way since then, there is no unified or standardized approach to evaluating technology-mediated language education. This colloquium addresses the need of evaluating technology-mediated language education from a programmatic view that considers the variables and factors that affect teaching and learning in contexts mediated by technology. The colloquium will address essential questions in evaluation such as: What is it exactly that we are evaluating (language learning, digital learning, outcomes, processes)? What are the best methods and tools to evaluate technology-mediated language courses? Can we borrow instruments and procedures from non-tech language courses? Do the environments provide unique affordances for or constraints on evaluation? Who are the intended users of the evaluation and how will it be utilized? If we want to keep the field of computer-assisted or technology-mediated language learning moving forward, these are essential questions that need to be addressed and resolved in the immediate future.

Jonathan Leakey || University of Ulster
A proposed model for evaluating CALL
Katie Nielson || Voxy
A multi-variable framework for evaluating online language courses: the case of Voxy
Jim Ranalli || Iowa State University
Exploring adaptations of an argument-based validation approach to the task of CALL evaluation
Estella Chen || National Chengchi University
A model to evaluate language learning MOOCs: MandarinX/edX
Marta González-Lloret
|| University of Hawai'i at Mānoa


Expert Panel Discussion

Theme: "Challenges and Prospects in Making Assessment and Evaluation Useful"

Invited Panelists: John Davis, Michael Kane, Dorry Kenyon, Richard Kiely, Constant Leung,
Marta Gonzalez-Lloret, Lorena Llosa, Meg Malone

Moderator: John Norris

This interactive panel discussion will provide GURT participants a unique opportunity to raise questions and engage in dialogue with a group of assessment and evaluation experts. Topics for discussion will be crowd-sourced from attendees over the first two days of the conference, and there will be time allocated for spontaneous Q&A. Please join us for this exciting event, and for the champagne reception to follow.

Pre-Conference Workshop


John Davis, Amy Kim, Todd McKay, Mina Niu, Young A Son, Francesca Venezia 

Georgetown University

The workshop is introductory and relevant for language educators new to program evaluation and looking to learn practical techniques to implement in their courses and programs.

Participants will receive a certificate of attendance after completing the workshop.

Thursday, March 10

Intercultural Center 101


“Planning useful evaluation in college language programs: Clarifying evaluation users, uses, and foci”

John McE. Davis

Synopsis: Language program evaluation must be organized and planned in specific ways to ensure its usefulness and productiveness for interested stakeholders. This session will illustrate why language educators should adopt a use-oriented approach to program evaluation. That is, evaluation in language programs should proceed via a clear understanding of (a) why the program—or particular aspects of the program—is/are being investigated; (b) who specifically is going to use evaluation findings and processes; and (c) how evaluation information/processes will be used by intended evaluation users. Clarifying and identifying these elements helps to increase the likelihood that evaluation findings will actually be used and useful for evaluation users and program stakeholders.

Objectives: Participants will understand (a) the range of potential users and users for language program evaluation projects, (b) strategies for identifying stakeholders in language program evaluation projects, and (c) elements of high-quality evaluation questions. Participants will be able to (a) identify intended uses and uses of a potential/future evaluation project and (b) articulate high-quality evaluation questions.



“Identifying evidence—or “indicators”—of program quality, effectiveness, needs, student learning”

Young A Son, Francesca Venezia

Synopsis: The usefulness of evaluation is challenged when evaluators collect information in a way that stakeholders regard as untrustworthy or unrelated to evaluation project goals. To avoid this situation, evaluators can/should take specific steps to identify the relevant sources of information (i.e., “indicators”) that will help answer evaluation questions and systematically shed light on targeted program elements, doing so in ways that help specific users understand the phenomena under evaluation and that help them make decisions and take actions.

Objectives: Participants (a) will be familiar with selected evaluation indicators commonly used in language program evaluation and (b) will be able to identify relevant, useful “indicators” for answering language program evaluation questions.



“Tools for collecting evaluation information in language programs: Interviews, focus groups, and questionnaires”

Todd McKay, Amy Kim

Synopsis: Interviews, focus groups, and questionnaires are the most commonly used tools for collecting the views and opinions of program stakeholders and constituents. Each, however, has advantages and disadvantages, strengths and weaknesses, and is best used in particular circumstances to achieve specific project goals. This session will help participants identify which tool is best suited to specific evaluation aims. The session also provides best practices and how-to advice for implementing questionnaires, interviews, or focus groups in language program evaluation projects.

Objectives: Participants will (a) understand the strengths and purposes of interviews, focus groups, and questionnaires for collecting information for program evaluation purposes, and (b) be able to identify which method is best suited to shed light on a given evaluation question.



“Planning next steps: Strategies for getting evaluation started in language education programs”

 John McE. Davis, Mina Niu

Synopsis: While program evaluation can be a powerful tool for educational innovation and improvement, it requires intentional planning and particular conditions and practices to function successfully. The final session asks participants to identify specific strategies, next steps, and time lines for implementing evaluation in their programs. The session will partly involve participants analyzing the current capacity in their programs for conducting high-quality, useful evaluation (e.g., extant resources, expertise, infrastructures) and brainstorming about how to build evaluation capabilities to enhance the usefulness of future evaluation efforts.

Objectives: Participants will identify immediate next steps, time frames, and needed capacity for implementing evaluation in their language programs and institutions.