In contrast to the recent increase in L2 testing research on the written modality of argumentation (e.g., Chuang, 2021; Lee, Lim & Basse, 2021), the assessment of L2 oral argumentation remains underexplored. For example, considering that international students must engage in debates and discussions in the US, it is surprising that current L2 speaking examinations only elicit the justification of opinions, requiring no depth of topical engagement but simultaneously assuming background knowledge. Such elicitations are an example of superficial argumentation (Deane & Song, 2015).
Hence, to explore the assessment of L2 argumentation skills, a scenario-based test was developed after a theoretical framework of argumentation as a cycle (Deane & Song, 2015). The test accounted for content knowledge as a performance moderator by introducing relevant topical information through the input. The purpose of the current study was to examine to what extent this test provided meaningful estimates of competency in L2 oral argumentation (i.e., L2 proficiency situated in the competency of building an argument from evidence and presenting it to an audience). In addition, the study examined whether embedding two types of assistance in the test (i.e., corrective feedback, task scaffolding) influenced L2 performance.
120 Mexican EFL learners took the scenario-based test. The scenario consisted of a simulated project where examinees played the role of student representatives in a podcast where college students took on the challenge of presenting an informed argument to the simulated podcast audience. To this end, they read about the topic, evaluated written arguments, and engaged in a simulated discussion before presenting their final argument. For scoring purposes, the scenario consisted of two reading tasks and seven speaking tasks (average duration was 75 min). Participants were randomly assigned to one of three forms. One form had no explicit assistance, while two forms had one type of explicit assistance (i.e., scaffolding, or corrective feedback) presented to examinees while completing reading tasks to learn about the test topic. Responses to the seven test tasks were double-blind scored on the scales of Delivery, Semantico-Grammatical Control, Organizational Control, Pragmatic Knowledge Control, Content Control, and Elaboration. Scores were analyzed using robust statistical analyses to examine the internal consistency, dependability, and task difficulty across forms (e.g., MG-theory, Rasch Analysis, Multiple Regression). A subset of the responses was transcribed to examine the quality of the elicitations. A learning-oriented approach (Purpura & Turner, 2018) to assessment as a design and validation framework was used to interpret qualitative and quantitative evidence.
Overall, results indicate that the test scales had adequate degrees of internal consistency and dependability across assistance conditions. Also, no systematic effects derived from the type of assistance provided were found. However, the group that was provided with scaffolding had slightly less dependable estimates than the group that received feedback and the group that received no explicit assistance. Hypothesized explanations for these trends will be discussed along with excerpts from examinee's comments from the post-test survey, and results from qualitative analysis of the responses. In addition, insights and implications for learning-oriented design will be discussed.
Jorge Beltrán Zúñiga
Paper presented at AAAL, Pittsburgh, PA; March 2022
A recent emphasis on the importance of aligning language assessment with learning and instruction by advocates of learning-oriented assessment has resulted in their call for the creation of language examinations that also provide test-takers with learning opportunities (e.g., Purpura, 2019; Purpura et al., 2021). This in an effort to provide meaningful language assessment experiences and accurate estimates of proficiency by gauging learners’ ability to use the input and tools presented to them in order to mediate gaps in understanding.
Scenario-based assessment (SBA) has been deemed appropriate to provide this type of experience, which has been used extensively in the US to test L1 reading proficiency, math skills, and argument writing (Bennet, 2011; O’Reilly & Sabatini, 2013, Zhang et al., 2019). Recent use of SBA for ESL testing has found that learners are able to acquire new information through a scenario, but have more limited gains in terms of word acquisition throughout the scenario (Banerjee, 2019; Purpura et al., 2021).
Since research on topical knowledge gains in ESL testing contexts has been conducted through multiple-choice testing, the current study aimed to determine to what extent 75 EFL learners was able to demonstrate changes in their spoken performance. Namely, to what extent there were changes in their display of understanding of the topic of mandatory voting when comparing responses to parallel argumentative speaking tasks presented before and after a scenario-based test. Speech samples were transcribed and analyzed through Python’s Jupyter notebook (to calculate NLP features) and qualitatively by two coders.
Results indicate that while practically all examinees showed improvement in terms of the development and observed lexical and syntactic complexity, the quality of the arguments presented in the post-scenario task was very similar for examinees who showed lower levels of ability and understanding of argument structure in the pre-scenario response.
Scenario Based Language Assessment Research Lab
Invited AAAL/ILTA Colloquium presented at AAAL, Pittsburgh, PA; March 2022
For many years now, there has been a growing body of research devoted to the design, validation, and use of educational or psychological assessments that are available in other languages through translation or adaptation. Such tests, if equivalent across languages, are sorely needed in contexts, where learners, not proficient in the language of the test, need access to its content (Stansfield & Bowles 2006). Such tests are also needed in contexts, where intelligence or personality tests are administered to learners (Kadriye & Lyons 2013). Finally, equivalent tests in multiple languages are needed in cross-cultural, comparative studies of educational achievement such as the Program for International Student Assessment (PISA), administered globally to assess “mastery of processes, understanding of concepts, and the ability to function in various situations within the domains of reading literacy, mathematical literacy, and scientific literacy” (Dossey et al. 2006, p. iv).
In Applied Linguistics, few studies have attempted to investigate the parity of assessments across different languages. One noted exception is the European Survey of Language Competencies (European Commission 2012), whose goal was to provide comparative data of foreign language competencies across fourteen European countries, so that progress towards efforts to improve language learning could be examined. Students took two of three translated tests covering reading, listening, or writing ability. The test content measured functional proficiency within each skill in terms of the CEFR. While this study was interesting, competency was limited to isolated skills outside a coherent situated context.
The current symposium examines the validity of using parallel scenarios as a basis for measuring situated L2 proficiency and learning across four typologically different languages. The same construct and specifications were used for test development. The rubrics were similar with minor language-specific variations. And validation procedures were mostly parallel. Insights and challenges will be discussed.
James E. Purpura & Heidi Liu Banerjee
Symposium paper presented at AAAL, Pittsburgh, PA; March 2022
In mainstream educational contexts, several testing programs have endeavored to measure disciplinary content in multiple languages. This has been done often by using rigorous translation or adaptation protocols. In language assessment, however, few studies have designed tests specifically to measure language competencies cross-linguistically. The noted exception is the European Survey of Language Competencies (European Commission 2012), which measured reading, listening and writing competencies defined in terms of functional statements linked to the Common European Framework of Reference (Council of Europe 2020).
Nonetheless, the world has become more complex, where the competencies needed to function effectively as global citizens have significantly changed. And the construct of situated L2 proficiency has evolved and considerably broadened to address these changes. Language assessments, however, have often failed to approximate the kinds of complex tasks which require examinees to demonstrate the language-driven competencies needed in an increasingly interconnected, diverse, and globalized world (Gordon Commission 2013; National Research Council 2011; Partnership for 21st Century Skills 2009).
To reflect the need to measure broadened constructs of language-driven competencies, and to examine how such competencies are displayed in cross-linguistic contexts, a scenario-based assessment (SBA) approach was adopted. Guided by a learning-oriented assessment framework (Purpura & Turner 2014; Turner & Purpura 2016), the SBA approach presents examinees with a carefully sequenced set of naturally-occurring scenes in which examinees carry out actions and interact with simulated characters until the overarching scenario goal (i.e., the target competency) is brought to resolution. Because the tasks within a scenario are designed to simulate habits of mind, SBA provides insights into how learners of typologically different languages utilize their situated language competencies to fulfill the scenario goal. This paper lays out the theoretical framework for exploring the cross-linguistic insights of using SBA across four typologically different languages.
Daniel Eskin, Jorge Beltrán, Soo H. Joo, James E. Purpura, & Heidi Liu Banerjee
Symposium paper presented at AAAL, Pittsburgh, PA; March 2022
As L2 education moves towards the development of 21st century skills, there is a need to utilize L2 assessments that can provide information about learners’ situated proficiency, particularly when it comes to decisions like course placement (Banerjee 2019; Purpura & Banerjee 2021). Given the potential of using scenario-based assessment (SBA) to measure situated proficiency, an SBA was developed and piloted to examine its viability for placement purposes in an adult ESL program at a large American university. The pilot included 55 participants ranging from beginner to advanced L2 English proficiency levels and it sought to understand not only how well a scenario-based assessment of situated L2 proficiency provided psychometrically sound and meaningful score interpretations, but also how it supported learning across the assessment.
The SBA revolves around a collaborative problem-solving task in which examinees, along with their simulated group members, have to learn about two possible overseas destinations for a class trip, decide which to go to, and present a persuasive, evidence-based argument in support of the group’s decision. To do this, examinees respond to a series of carefully-sequenced independent and integrated skills tasks designed to reflect the habits of mind used to accomplish the scenario goal (i.e., making a spoken pitch). As examinees are expected to build topical knowledge during the assessment, their pre and post-scenario topical knowledge is measured.
Test functionality and score meaningfulness were examined through a series of statistical procedures, including classical test theory, multi-facet Rasch measurement, and multivariate G-theory. Results showed that the assessment provided reasonably good estimates of situated L2 proficiency, applicable for placement purposes. The analyses also showed that learning indeed transpired during the assessment, as evidenced by the observed gains on the pre- and post-scenario topical knowledge measures. Implications of the use of SBA to measure situated L2 proficiency will be discussed.
Soo Hyoung Joo, Ji-Young Jung, & Yuna Seong
Symposium paper presented at AAAL, Pittsburgh, PA; March 2022
Korean language proficiency assessments (e.g., Test of Proficiency in Korean [TOPIK], Korean Language Proficiency Test [KLPT], ACTFL Oral Proficiency Interview [OPI] and Writing Proficiency Test [WPT]) have been mostly organized around decontextualized tasks designed to measure independent skills and have depended heavily on selected-response tasks (e.g., multiple choice). As a result, they have overlooked the need to measure how foreign language learners of Korean can effectively use Korean to function in more complex language use contexts--those where they need to display real-life competencies.
To address this gap, a scenario-based KFL assessment was designed to measure the “situated foreign language Korean proficiency” of 51 students from a KFL program at a US university. The participants included both heritage and non-heritage learners and their proficiency levels ranged from low-intermediate to advanced.
The SBA was organized around a study abroad program context in South Korea, where examinees needed to learn about, decide on, and argue for one of two destinations for a class trip (i.e., Jeonju and Ulleng-do). While the scenario narrative was parallel to the English SBA in its design, the context of the scenario was adapted to a study abroad context for foreign language learners and the rubric for the writing and speaking tasks were adapted to address the Korean language specific characteristics. Examinees were expected to learn about the two destinations during the assessment.
Similar to the English SBA, test scores were analyzed using classical test theory, generalizability theory, and Rasch analysis. The results indicated that the test functioned well psychometrically. Also, the gains seen in the pre and post-scenario topical knowledge measures along with results of a post-test survey provided convincing evidence that the test served as a valuable “educational experience in and of itself” (Bennet 2010, p.1).
Mahshad Davoodifard, Payman Vafaee, Nahal Akbari-Saneh
Symposium paper presented at AAAL, Pittsburgh, PA; March 2022
With an increasing need for standardized assessment of Persian Proficiency, tests such as Persian Computerized Assessment of Proficiency (CAP) and the first standardized test of Persian proficiency (SAMFA) have been developed in alignment with frameworks such as ACTFL or CEFR. These tests, however, measure knowledge of grammar and vocabulary, as well as ability in different skills independently from each other and typically out of context. To overcome this weakness, we created a scenario-based assessment (SBA) to measure “situated” proficiency in Persian. The SBA was designed to guide the test takers through a set of thematically related tasks where they had to obtain information about two destinations in Iran, Kerman University and Gilan University, for a study abroad program hosted by one of these universities. The goal of the SBA was for the students to pitch their idea convincing a committee as to which destination to select.
While the general guidelines of the English SBA test design and scoring were followed, appropriate adjustments were made to adapt for the cultural as well as linguistic features of Persian. The SBA was initially administered to a group of 12 Persian learners at the Persian Flagship Program at the University of Maryland. The test takers were heritage as well as non-heritage language learners, and their Persian proficiency ranged from low-intermediate to advanced level. Due to limited sample size, classical test theory was used to analyze the Persian SBA test scores. The results indicated that the test functioned well psychometrically and served as a valuable educational tool in significantly increasing the students’ topical knowledge.
Sabrina Machetti, Giulia Peri, & Paola Masillo
Symposium paper presented at AAAL, Pittsburgh, PA; March 2022
Italian language proficiency assessments (e.g., CILS, CELI, Cert-IT, PLIDA) have traditionally revolved around independent skills tasks designed to measure the traditional skills (reception, production, interaction). These tests have also been heavily influenced by the Common European Framework of Reference (CoE 2001). The most important international language certification tests are aligned with the CEFR, including those in Italy (CoE 2009; Martyniuk 2010; Barni & Machetti 2019).
While existing tests have attempted to measure language use, there is still the need to measure how second language learners of Italian can effectively use their situated L2 proficiency to function in more complex language use contexts such as those where examinees need to display real-life competencies in complex tasks involving learning and instruction.
To address this gap, an L2 Italian scenario-based assessment was designed to measure the “situated Italian language proficiency” of hundreds of students enrolled in B1-B2 Italian language courses at the University for Foreigners of Siena. Participants included prospective students at the same University as well as those students applying to enroll in degree programs at other Italian Universities.
The SBA was developed around a group problem-solving task in which examinees, together with their simulated group members, learn about two possible destinations in Italy (i.e., Sicily and Abruzzo) for a class trip and have to decide where to go.
While the scenario narrative was parallel to the English SBA in design, it was adapted to the Italian context for L2 learners, and the rubrics for the writing and speaking tasks were adapted to reflect Italian language characteristics. The test design was oriented toward promoting a viable educational experience in its own right (Bennet 2010) as it focused not only on linguistic features, but also on the development of topical knowledge (Purpura 2016).
Jorge Beltrán Zúñiga, Daniel Eskin, Soo H. Joo, James E. Purpura, & Heidi Liu Banerjee
Symposium paper presented at the Virtual Language Testing Research Colloquium, March 2022
In an increasingly interconnected, diverse, and globalized society, the competencies that language learners need to successfully function in their L2 in the real world have become ever more complex; however, these real-life competencies are inadequately accounted for and oftentimes overlooked in the majority of language tests (Carroll, 2017; Purpura, 2019; Sabatini et al., 2020). Advocating for measuring a broadened construct of L2 proficiency, scenario-based language assessment (SBA) has been employed in various contexts, including mainstream education (e.g., Bennet, 2010) and to measure L1 literacy (e.g., Sabatini & O'Reilly, 2013). L2 researchers have begun to experiment with the usability of SBA as well (Banerjee, 2019; Purpura et al., 2021). SBA is an innovative assessment technique that uses authentic “scenarios” to sequence a set of naturally-occurring scenes around real world competencies (e.g., making a pitch). Purpura (2016; 2019) suggests that SBA allows the design of assessments that account for the underlying knowledge, skills and abilities that are necessary in performing real-world competencies, and that the learning oriented assessment (LOA) framework (Turner & Purpura, 2016), which conceptualizes the dynamic interactions between instruction, learning, and assessment, can be used as a theoretical framework underlying the SBA design and validation.
To explore the feasibility of designing a learning-oriented SBA to measure a broadened construct of Korean proficiency, an scenario-based Korean proficiency test (K-SBA) was designed and examined in a pilot study with 51 participants from a Korean as a foreign language program in a US university. The participants included both heritage and non-heritage language learners and their proficiency levels ranged from low-intermediate to advanced. Through a goal-oriented scenario of a study abroad program in South Korea, examinees were presented with a collaborative problem solving task where they had to learn about two potential class trip destinations and ultimately advocate for one based on what they learned.
Test scores were analyzed using classical test theory, generalizability theory, and many-facet Rasch measurement analyses. The results indicated that the K-SBA provided reliable measurement characteristics. Additionally, the gain in the examinees’ topical knowledge score and the post-survey findings showing that the examinees’ perceived the test as a meaningful learning experience suggest that the test served as a valuable “educational experience in and of itself” (Bennet, 2010, p.1).
Soo Hyoung Joo, Yuna Seong, Joowon Suh, Ji-Young Jung, and James E. Purpura
Paper presented at Asian Association for Language Assessment Conference, 2021
In an increasingly interconnected, diverse, and globalized society, the competencies that language learners need to successfully function in their L2 in the real world have become ever more complex; however, these real-life competencies are inadequately accounted for and oftentimes overlooked in the majority of language tests (Carroll, 2017; Purpura, 2019; Sabatini et al., 2020). Advocating for measuring a broadened construct of L2 proficiency, scenario-based language assessment (SBA) has been employed in various contexts, including mainstream education (e.g., Bennet, 2010) and to measure L1 literacy (e.g., Sabatini & O'Reilly, 2013). L2 researchers have begun to experiment with the usability of SBA as well (Banerjee, 2019; Purpura et al., 2021). SBA is an innovative assessment technique that uses authentic “scenarios” to sequence a set of naturally-occurring scenes around real world competencies (e.g., making a pitch). Purpura (2016; 2019) suggests that SBA allows the design of assessments that account for the underlying knowledge, skills and abilities that are necessary in performing real-world competencies, and that the learning oriented assessment (LOA) framework (Turner & Purpura, 2016), which conceptualizes the dynamic interactions between instruction, learning, and assessment, can be used as a theoretical framework underlying the SBA design and validation.
To explore the feasibility of designing a learning-oriented SBA to measure a broadened construct of Korean proficiency, an scenario-based Korean proficiency test (K-SBA) was designed and examined in a pilot study with 51 participants from a Korean as a foreign language program in a US university. The participants included both heritage and non-heritage language learners and their proficiency levels ranged from low-intermediate to advanced. Through a goal-oriented scenario of a study abroad program in South Korea, examinees were presented with a collaborative problem solving task where they had to learn about two potential class trip destinations and ultimately advocate for one based on what they learned.
Test scores were analyzed using classical test theory, generalizability theory, and many-facet Rasch measurement analyses. The results indicated that the K-SBA provided reliable measurement characteristics. Additionally, the gain in the examinees’ topical knowledge score and the post-survey findings showing that the examinees’ perceived the test as a meaningful learning experience suggest that the test served as a valuable “educational experience in and of itself” (Bennet, 2010, p.1).
James E. Purpura and Heidi Liu Banerjee
Symposium organized for Virtual LTRC, June 2021
As theoretical conceptualizations of L2 proficiency have evolved and broadened over the past 50 years, so have approaches to measure proficiency (Bachman, 2007; Purpura, 2016). One approach to proficiency testing has centered around the measurement of isolated linguistic elements or the integration of these elements in comprehension or production (Carroll, 1961; Lado, 1961). Another has organized assessments around language skills, where in one version examinees respond to isolated tasks measuring one skill at a time (independent writing), while in another version, examinees complete tasks integrating skills (integrated reading and speaking). Measurement in the skills-based approaches has focused on evaluating the knowledge, skills and abilities (KSAs) underlying the skill (language control). Yet another approach to L2 proficiency assessment has deemphasized the traits underlying performance in favor of organizing assessments around the completion of goal-oriented tasks designed as proxies for real-life task completion (Norris, 2016). While these approaches have served us well for the last 50 years, they often fail to approximate the kinds of complex tasks that allow examinees to demonstrate the language-driven competencies needed for an increasingly interconnected, diverse, and globalized world (Gordon Commission 2013; National Research Council, 2011; Partnership for 21 st Century Skills, 2009).
To address this gap, a recent approach to L2 assessment has organized assessments around real world competencies (e.g., make a presentation). This approach uses “scenarios” as a technique for presenting examinees with a carefully sequenced set of naturally-occurring scenes in which examinees carry out actions and interact with each other until they bring the overarching scenario goal (the competency) to resolution. Drawing on many assessment approaches, scenario-based assessment (SBA) requires examinees to engage in, display, develop, and coordinate a range of topical, socio-cognitive, linguistic, and dispositional resources in order to work collaboratively, learn while doing, and ultimately solve problems. Measurement takes a trait and task centered approach (Bachman, 2002) in addition to measuring gain. Digitally delivered, SBAs aim to provide a worthwhile education experience in their own right.
The use of SBAs has already been initiated in some mainstream assessment contexts, especially those involving science, math, or L1 literacy (e.g., Bennett, 2010; Bennett & Gitomer, 2009; Sabatini et al., 2014). In these contexts, researchers use technology-enhanced assessment designs to capture the KSAs implicated in performing complex, goal-oriented tasks that examinees often encounter in real life. According to Bennett (2015), “the design, format, and content [...] should exemplify the knowledge, processes, strategies, practices, and habits of mind” of the learners” (p. 379) if a test is expected to represent “true” competencies and if the benefits of assessment to teaching and learning are to be maximized. We believe that a similar shift is needed in L2 assessment.
This symposium, thus, demonstrates how L2 assessment can be organized to measure 21st century, language-driven competencies through SBA. We invite the testing community to rethink how assessments might better capture the information we value, especially as assessment relates to learning and instruction. To illustrate the diversity, flexibility, and applicability of SBA, the presentations traverse several L2 assessment contexts.
James E. Purpura and Heidi Liu Banerjee
Symposium paper presented at Virtual LTRC, June 2021
The symposium will begin with an introduction to the theoretical underpinnings of scenario-based language assessment. This will be followed by papers in which scenario-based assessment has been empirically implemented. The symposium will conclude with a discussant session, followed by Q&A.
James E. Purpura, Heidi Liu Banerjee, Jorge Beltrán Zúñiga, Brady Robinson, and Payman Vafaee
Symposium paper presented at Virtual LTRC, June 2021
A case for the departure from traditional assessment models has been made outside the field of L2 assessment, particularly in large-scale K-12 assessment contexts, where SBAs have been used to assess reading (Sabatini & O’Reilly, 2013; O’Reilly & Sheehan, 2009), writing (Sabatini et al, 2011), and mathematics (Bennet, 2011; Harris & Bauer, 2009). The application of SBAs in these contexts follows a series of principles and assumptions that inform their development, such as purpose-driven and learning orientations. They also employ innovative designs that reflect real-life projects and essential 21st century competencies. In light of its potential for measuring a broadened L2 proficiency construct, a pilot study was conducted with 55 participants from A2 to C1 proficiencies. The purpose of the study was to explore the feasibility of using an SBA for placement purposes in the context of an adult ESL program as well as to investigate interfaces among assessment, teaching and learning. Examinees were presented with a collaborative problem solving task in which they responded to a carefully-sequenced set of simple and complex tasks designed to reflect the habits of mind used to resolve the scenario goal (the competency). Aside from measuring performance, the SBA was also designed to teach examinees something new. Results showed the test provided reasonable measurement characteristics for placement purposes. It also showed that examinees learned during the test. The post-test survey results showed high degree of engagement and perception of authenticity. Study challenges, limitations, and future directions will also be discussed.
Yuna Seong
Symposium paper presented at Virtual LTRC, June 2021
With the increasing number of international students in US higher education, many universities call for a deepened understanding of international students’ learning needs for academic success. In the academic domain, L2 learners must demonstrate complex academic competencies such as participating in class discussions or giving presentations. This not only calls for the speaker’s L2 knowledge, but it also requires the use metacognitive and cognitive strategies to process, synthesize, and utilize information from lectures and readings. The L2 academic speaking test used in this study adopted a scenario-based assessment design to replicate real-life academic speaking demands and elicit performance that that taps into the knowledge, skills and abilities (KSAs) that are directly applicable to their performance in the real world. The test scenario called for the test-takers’ participation in an online oral discussion forum where they were asked to listen to audio materials on a given topic and perform a series of strategy tasks simulating the cognitive and metacognitive strategies (e.g., planning, summarizing, synthesizing) that enable them to complete the scenario goal. The purpose of this study was to explore the affordances of using scenario-based assessment that includes and explicitly measures the cognitive dimension of L2 academic speaking ability. A pilot study of the test was conducted with 32 participants. Results showed that the test was perceived by the test-takers as authentic and useful measure of academic speaking ability as the strategy tasks reflected their real-life habits of mind in performing similar academic tasks.
Jorge Beltrán Zúñiga
Symposium paper presented at Virtual LTRC, June 2021
In recent years, scenario-based assessment has been explored in various contexts given its potential to model and target complex constructs and processes, such as L1 reading proficiency (O'Reilly & Sabatini, 2013, 2014, 2016), topical knowledge (Banerjee, 2018, 2019), and L2 academic speaking ability (Seong, 2018). Following these efforts, the current study aimed to examine how this technique can be implemented to assess ELLs' ability to build and defend an argument orally. In order to engage test-takers in the various stages of an argumentation cycle (Song, Deane, Graf, & van Rijn, 2015), a scenario-based test of speaking ability which required test-takers to display a range of real-world competencies was developed and piloted. Throughout the test, test-takers had to 1) understand the stakes of the project by listening to a meeting and reporting key points to an absent party; 2) evaluate online forum posts based on a set of guidelines; 3) share a position in a class discussion based on evidence; 4) respond to counter arguments from a simulated peer; 5) present a fully-developed argument to a simulated school council roundtable. Results from an administration with 73 EFL learners will be discussed. Specifically, the discussion will focus on observed differences in knowledge sharing strategy use and in the adequacy of use of linguistic features to meet task-specific demands.
Jorge Beltrán
Paper presented at Virtual AAAL, March 2021
Current trends in L2 testing research acknowledge the importance of assessing meaningful language use. However, certain contexts of language use remain understudied. For instance, while argumentation is paramount in anglophone educational systems, both at the curricular (e.g., in the Common Core Standards in the US) and instructional levels (e.g., through discussions and debates), little attention is paid to how L2 learners engage in spoken argumentation and how it might be assessed. Considering this, a scenario-based speaking test that models a complex argumentation cycle (Song, Deane, Graf, & van Rijn, 2015) was developed and piloted. The current study aimed to determine whether this test successfully elicited argumentative language and to examine the possible effects of choice on performance in a group of 71 EFL learners. An experimental group was able to choose a position to defend throughout the test (+choice), while two control groups (-choice) were assigned a position either in favor or against the policy described in the scenario. The psychometric qualities of the test were examined with Many-Facet Rasch Measurement and Multivariate Generalizability Theory. Results suggest the test had a relatively high degree of dependability and was able to identify various proficiency levels with minor discrepancies across forms. However, the analytic scales contributed differently to the composite test score. Qualitative analyses of the speech samples revealed differences in knowledge sharing related to proficiency (i.e., how claims, evidence, and reasoning were presented by low- and high-ability students), but such proficiency-based differences were not present in the use of interaction markers to address the simulated audience. Moreover, an examination of the -choice and +choice conditions suggests that rather than the variable of choice itself, differences in dependability across test forms could be attributed to the content of task prompts. Finally, marked differences in perceptions of the test were found across groups.
Jorge Beltrán
Paper presented at Virtual ECOLT, October 2021
Jorge Beltrán
Plenary presented at the Virtual Conference Perspectivas, Experiencias y Retos en la Ensenanza de Lenguas, UNACH, November 2020
James E. Purpura
Paper presented at the 1st International Perspectives on Assessing World Languages Conference, Cairo, EG; January, 2019
Heidi Liu Banerjee
Paper presented at LTRC, Atlanta, GA; March, 2019
With the vast development of digital technology and the widespread use of social network platforms, the competences required for academic and career success in the 21st century have expanded to include complex skills where individuals need to demonstrate their abilities to think critically, reason analytically, and problem-solve strategically (Shute et al., 2010). Consequentially, there has been a call to broaden the constructs of communicative language ability in L2 assessment to better represent the everyday language use in the modern society (e.g., Bachman, 2007; Purpura, 2015, 2016; Sabatini et al., 2014), so that the results of an assessment can yield interpretations that are aligned with the contemporary views on L2 knowledge, skills, and abilities (KSAs).
Scenario-based assessment (SBA), an innovative, technology-based assessment approach, shows great affordances for expanding the measured constructs of an assessment. Initiated by the CBALTM project (Bennett, 2010; Bennett & Gitomer, 2009) to address the limitations of traditional language assessment, SBA is designed in a way that learners can demonstrate their KSAs in a context that simulates real-life language use. Through the utilization of a sequence of thematically-related tasks along with simulated character interaction, SBA offers opportunities to examine L2 learners’ communicative competence in a purposeful, interactive, and contextually meaningful manner.
The purpose of this study is to utilize SBA to measure high-intermediate (CEFR B2) L2 learners’ topical knowledge and their L2 KSAs as part of the broadened constructs of L2 communicative competence. To fulfill the scenario goal, learners are required to demonstrate their listening, reading, and writing abilities to build and share knowledge. In addition, learners’ prior topical knowledge was measured and their topical learning tracked using the same set of topical knowledge items.
118 adult EFL learners participated in the study. The results showed that the tasks embedded in the SBA served as appropriate measures of high-intermediate learners’ communicative competence. The topical knowledge items were found to function appropriately, supporting the use of SBA to measure topical knowledge as part of the broadened constructs of communicative competence. In addition, most learners exhibited substantial topical learning over the course of the SBA, suggesting that with proper contextualization, learning can be facilitated within an assessment. In sum, this study demonstrates the potential value of SBA as an approach to measure complex constructs of communicative language competence in L2 contexts.
Jorge Beltrán
Poster presented at AAAL, Atlanta, GA; March, 2019
Jorge Beltrán
Paper presented at the British Council’s New Directions Latin America Conference, Mexico City, MX; March, 2019
James E. Purpura
Invited plenary presented at the British Council’s New Directions Latin America Conference, Mexico City, MX; March, 2019
Jorge Beltrán
Best student paper award presented at MwALT, Bloomington, IN; October, 2019
Brady Robinson
Poster presented at New York State TESOL 49th Annual Conference, White Planes, NY; November, 2019
This scenario-based assessment (SBA) of English as a second language is a template designed to be delivered via computer. The SBA was created using the learning-oriented assessment (LOA) framework that can be used to build, develop, and study large-scale or classroom-based assessments (Purpura & Turner, 2016).
Heidi Liu Banerjee
Colloquium paper presented at AAAL, Chicago, IL; March, 2018
To achieve effective communication, L2 learners need to be equipped with not only the necessary L2 knowledge, skills, and abilities (KSAs), but also, to the extent possible, the essential topical knowledge. While many researchers believe that topical knowledge should be viewed as an integral component of L2 learners’ communicative language competence (Bachman & Palmer, 1996, 2010; Purpura, 2015, 2016), the role of topical knowledge has not always been accounted for in an assessment context due to the difficulty of operationalizing the construct.
Scenario-based assessment (SBA), an innovative assessment approach, has empirically shown to have the capability of measuring a broadened construct of language proficiency (Sabatini & O’Reilly, 2013). By operationalizing topical knowledge as a performance moderator, SBA offers opportunities to examine the role of topical knowledge in L2 proficiency. The purpose of this presentation is to demonstrate how topical knowledge can be measured, and how learners’ topical learning can be tracked using a scenario-based language assessment (SBLA).
The SBLA in this study, titled “Nutrition Ambassador” is a high-intermediate integrated skills test developed for placement purposes. 41 adult intermediate- to advanced-level L2 learners participated in the study. Topical knowledge was operationalized as content and lexical knowledge, and the topical knowledge task was administered both before and after the language-related scenario tasks. The score differences between the two administrations served as an indicator of topical learning.
The results show that all topical knowledge items functioned appropriately, providing adequate evidence supporting the use of SBLA to measure topical knowledge. Most test-takers demonstrated significant topical learning, especially content learning, and the extent of topical learning was shown to relate to the test-takers’ language proficiency levels. The findings shed light on how SBLA may be utilized to measure a broadened construct of L2 proficiency, and how proper contextualization may facilitate learning in an assessment.
James E. Purpura and Heidi Liu Banerjee
Colloquium organized for AAAL, Chicago, IL; March, 2018
James E. Purpura
Invited plenary presented at the Twentieth Anniversary of the CILS Exam, University of Foreigners of Siena, Siena, IT; June, 2018
James E. Purpura
Invited plenary presented at the Twentieth Anniversary of the CILS Exam, University of Foreigners of Siena, Siena, IT; June, 2018
James E. Purpura
Invited plenary presented at the The Lingua e Nuova Didattica (LEND) Conference, Portonovo, IT; June, 2018
Jorge Beltrán
Work in Progress presented at LTRC, Auckland, NZ; July, 2018
One of the goals of research in L2 assessment is developing tests for which the operational construct actually reflects features of the TLU domain. Numerous efforts have been made in order to better reflect the needs of 21st century language test users, with some approaches to assessment highlighting the potential of technology in reimagining L2 assessment. For instance, game-based assessment (e.g. Attali & Arieli-Attali, 2015) and scenario-based assessment (e.g. Sabatini & O’Reilly, 2013; O’Reilly & Sheehan, 2009) aim to take advantage of the technological features that allow for more authentic representations of language use, which become particularly relevant given the current impact of computer-based systems on education and assessment.
In scenario-based assessment, task sequencing and sampling is contingent upon an overarching goal and theme, which provides an opportunity to contextualize a test in a purposeful, learning-oriented fashion, (Sabatini & O’Reilly, 2013). Thus, this approach represents a promising alternative for the assessment of integrated speaking ability. For example, the use of simulated peers, multiple turn tasks, and branched elicitations through decision-making tasks in alignment to an overarching goal would help authenticate construct representation in semi-direct tests of speaking ability.
When it comes to the coverage of communicative purposes in semi-direct assessment of speaking ability, one communicative function that has been readily explored in the context of writing, but has not been examined in such depth in the context of speaking ability, is the elaboration and defense of an argument. Nonetheless, certain real-life tasks require that language learners display their ability to build and defend an argument, for example, class discussions or debates. In light of this, it was determined that the development and analysis of a scenario-based test that guides students through the argumentation cycle (Song, Deane, Graf, & van Rijn, 2013) could provide useful information in terms of the functionality and measurement qualities of such test, and would help evaluate the effectiveness of the tasks to elicit argument-building language.
Therefore, this paper aims to report preliminary findings on the following issues: a) whether the test successfully elicits argumentative language, b) whether the use of multi-turn items enhances construct representation for the semi-direct test (e.g. whether test takers display understanding of an audience through simulated peers), and c) whether a rubric including an appropriateness component is adequate to score the data. To answer these questions, the following analyses will be conducted: qualitative analysis and Multi-Faceted Rasch Measuremet (MFRM). Data collection is to be held in Spring of 2018. In addition to preliminary findings, limitations and possibilities for future research will be discussed.
Yuna Seong
Paper presented at LTRC, Auckland, NZ; July, 2018
From the skills-and-elements approach to the more recent models of communicative language ability, the construct of second language (L2) proficiency has evolved and broadened over the past few decades (Bachman, 2010; Purpura, 2016). In line with this conceptual change and fueled by advances in digital technology, scenario-based assessment (SBA) has been utilized and examined by language testers (e.g., Sabatini, O'Reilly, Weeks, & Steinberg, 2016) seeking current and innovative assessment practices that can reflect the modern-day language use while appropriately measuring language proficiency that accounts for linguistic as well as non-linguistic factors (e.g., topical knowledge or strategy use).
The purpose of this study was to examine academic speaking ability and its cognitive dimension using an online Scenario-based Academic English Speaking Test (SBAEST) to better capture a broadened construct of academic speaking ability that includes cognitive thinking processes. L2 academic speaking ability not only calls for students’ communication of disciplinary knowledge, but it also involves integrated use of metacognitive and cognitive strategy use in order to successfully process, synthesize, and utilize the information (Chamot & O’Malley, 2004; Zweir, 2008) to perform complex academic speaking tasks such as participating in a class discussion or giving a presentation. SBA allows test takers to demonstrate their language proficiency in a meaningful and goal-oriented context by performing thematically related tasks sequenced in a way that simulates real life problems. The computer-based SBAEST for the current study was designed to replicate real-life academic speaking demands. Students must listen to audio and video materials on a given topic and orally respond to questions or share their opinions by summarizing and synthesizing the information from the materials. In addition to the speaking tasks, the test included strategy tasks designed to specifically elicit students’ use of cognitive strategies (e.g., planning, predicting, and recalling key points). This study examined the nature of academic speaking ability and its cognitive dimension by studying the takers’ performance on the strategy tasks and its relationship with speaking performance. Thirty-two high-intermediate to advanced ESL learners took the test, and the test results were analyzed using many-facet Rasch measurement (MFRM). Students’ responses on the strategy tasks were scored based on qualitative examination of the patterns or characteristics indicative of effective or ineffective strategy use. Relationship between the students’ performance on strategy tasks and speaking tasks was analyzed using correlational analyses. Results indicated that strategy use is an integral component of the academic speaking ability construct, and some of the interesting patterns and characteristics of cognitive strategy use were reviewed in relation to the test takers’ speaking performance. Further suggestions on improving test design and implications of using SBA for assessing academic speaking ability will be discussed. Assessments such as the SBAEST may have implications for the diagnosis and assessment of student learning needs, specifically with respect to distinguishing between linguistic and cognitive aspects of academic speaking ability.
Fred Tsutagawa
Paper presented at LTRC, Auckland, NZ; July, 2018
Despite the fact that pragmatics testing research has long understood the importance of context, especially in regard to setting up a situation in terms of topic, setting, participant roles, power relationships, degrees of social distance between interlocutors, etc. (Blum-Kulka, House, & Kasper, 1989; Chapelle, 1998; Halliday & Hasan, 1989; Hudson, LTRC 2018 66 Detmer, & Brown, 1992, 1995; Timpe-Laughlin et al., 2015), this kind of contextual information has traditionally been conveyed, and arguably controlled, to limit individual test-taker knowledge and attributes by using lengthy written task descriptions that rely on strong reading comprehension skills on the part of the test taker in order to do well on the tasks (Grabowski, 2009). The current pilot study, therefore, expands upon and incorporates aspects of context and background knowledge into the design of a pragmatics speaking assessment by using realistic semi-direct audio and video prompts in a “life-as-a-graduate-student” scenario to determine the extent to which adding real-life contexts such as richer, more detailed interpersonal background information onto the people in the scenario and actually seeing the physical environment where the scenario is taking place, affects test-taker spoken responses. The subjects will interact with six scenariobased video tasks in which they will be asked to make requests of differing degrees of imposition, show appreciation, make complaints, and discuss difficult interpersonal problems, etc., across multiple turns, with all spoken responses audio recorded. For the present pilot study, approximately forty-five total participant samples will be randomly collected, selecting participants from three beginning (n = 15), intermediate (n =15), and advanced (n = 15) proficiency level groups from a Community Language Program (CLP) at a major North American university. In addition, data from approximately 5 doctoral students in an Applied Linguistics graduate program will be collected as a nonequivalent control group. Two native speaker raters will rate the responses for 1) grammatical accuracy, 2) semantic meaningfulness, 3) functional appropriateness, 4) sociolinguistic appropriateness, 5) sociocultural/intercultural appropriateness, and 6) psychological appropriateness (Purpura, 2004, 2017). Descriptive statistics will be gathered for each of the six pragmatic tasks that are rated by the two native speaker raters, and t-test and/or analysis of variance (ANOVA) analyses will be conducted on the three proficiency subgroups to search for possible differences between them. Next, many-facet Rasch measurement (MFRM) will be used to examine the facets of examinee, task scenario type, raters, and rating categories. The results from SPSS and Facets will then be further compared to qualitative post-test survey responses to see if test-taker perceptions matched or digressed from the actual scored pragmatics speaking test results. Sources of rater variance and bias will also be carefully investigated for interactions between the raters and the examinees, task scenario types, and rating categories. As this kind of scenario-based, semi-direct speaking test of pragmatics has not been attempted before, it is hoped the results will provide valuable proof of concept for adoption as a possible large-scale testing solution for speaking tests. Finally, recommendations will be made for designing future pragmatic test tasks of this kind.
Heidi Liu Banerjee
Paper presented at ECOLT, Princeton, NJ; October 2018
With the vast development of digital technology and the widespread use of social network platforms, the competences required for academic and career success in the 21st century have expanded to include complex skills such as critical thinking, analytical reasoning, and strategic problem-solving. Consequentially, there has been a call to broaden the constructs of communicative language ability in L2 assessment to better represent the everyday language use in the modern society.
Scenario-based assessment (SBA), an innovative, technology-based assessment approach, shows great affordances for expanding the measured constructs of an assessment. Through the utilization of a sequence of thematically-related tasks along with simulated character interaction, SBA offers opportunities to examine L2 learners’ communicative competence in a purposeful, interactive, and contextually meaningful manner.
The purpose of this study is to utilize SBA to measure high-intermediate L2 learners’ topical knowledge and their L2 KSAs as part of the broadened constructs of L2 communicative competence. To fulfill the scenario goal, learners are required to demonstrate their listening, reading, and writing abilities to build and share knowledge. In addition, learners’ prior topical knowledge was measured and their topical learning tracked using the same set of topical knowledge items.
118 adult EFL learners participated in the study. The results showed that the tasks embedded in the SBA served as appropriate measures of high-intermediate learners’ communicative competence. The topical knowledge items were found to function appropriately, supporting the use of SBA to measure topical knowledge as part of the broadened constructs of communicative competence. In addition, most learners exhibited substantial topical learning over the course of the SBA, suggesting that with proper contextualization, learning can be facilitated within an assessment. In sum, this study demonstrates the potential value of SBA as an approach to measure complex constructs of communicative language competence in L2 contexts.
Heidi Liu Banerjee
Best student paper award presented at MwALT, Dayton, OH; October, 2017
To achieve effective communication, second language (L2) learners need not only the necessary L2 knowledge, skills, and abilities (KSAs), but also the relevant topical knowledge. However, in L2 assessment contexts, very few existing tests-in-operation have examined L2 learners’ topical knowledge as part of their language performance. This study investigates the construct of topical knowledge in a scenario-based language assessment (SBLA) that simulates real-life language use of building and sharing knowledge. 41 L2 learners at the intermediate to advanced levels from a university-affiliated adult ESL program participated in the study. Topical knowledge was operationalized as content knowledge and lexical knowledge. Descriptive statistics, correlations, and Rasch analysis were used to examine the role and to establish evidence of construct validity of topical knowledge in the SBLA. The results showed that L2 learners’ content knowledge and lexical knowledge, while serving one overarching construct of topical knowledge, function differently in the process of building and sharing knowledge. However, both play a role in the performance outcomes and should be recognized as an integral component of L2 proficiency. The findings also lend evidentiary support for the use of a purpose-driven, highly-contextualized SBLA to broaden our understanding of the underlying constructs of L2 communicative ability.
Brian Carroll, Heidi Liu, and Saerhim Oh
Paper presented at the First TC/ETS Forum on Teaching, Learning, and Assessment of English Language Learners. New York, NY; April 2015
Researchers in the field of educational assessment have begun to re-conceptualize and broaden the construct of L2 ability to better align with the contemporary focus on integrated literacy skills and learner cognition (O’Reilly & Sheehan, 2009; Sabatini, O’Reilly, Halderman, & Bruce, 2014). Scenario-based assessment is one of the most current and innovative assessment approaches to examine learners’ integrated skills in a purposeful, interactive, and strategic manner. Bennett and colleagues (2011) used scenarios-based tasks in their Cognitively-Based Assessment of, for, and as Learning (CBAL) project in middle school reading, writing, and mathematics; Sabatini et al. (2014) used scenario-based assessment to measure middle school students’ reading comprehension. While scenario-based assessment has received much attention in assessing students’ literacy in K-12 mainstream classrooms, there has been very little published work in the use of scenario-based assessment in the field of applied linguistics and second language testing.
The purpose of this talk is to present an attempt to implement a scenario-based assessment designed to make placement decisions in an adult, ESL language program. We will focus our discussion on the intermediate-level scenario, as part of a broader set of assessments. First, we will share our design narrative of the scenarios-based English placement exam. We will then present the preliminary scenarios-based test tasks assessing reading, listening, writing, and speaking skills within a single scenario. Then, we will discuss the challenges and issues we face as we explore the area of scenario-based assessment.
James E. Purpura
Paper presented at the First TC/ETS Forum on Teaching, Learning, and Assessment of English Language Learners. New York, NY; April 2015
James E. Purpura, John Sabatini, Tenaha O'Reilly, Brian Carroll, Heidi Liu, and Saerhim Oh
Paper presented at ECOLT, Washington, D.C.; October 2015
James E. Purpura, Heidi Liu, Sarah Woodson, and Fred Tsutagawa
Plenary presented at LTRC, Amsterdam, NL; June, 2014
Many assessment researchers (e.g., Shohamy 1998; Turner, 2012; Hill & McNamara, 2012) have highlighted the central role that assessment plays in L2 classrooms and have expressed the need to relate assessment principles and practices to teaching and learning in L2 instructional contexts. As a result, research in this area has examined: (1) teacher practices and processes in using L2 assessments (Leung & Teasdale, 1997; ReaDickins, 2003; Colby-Kelly & Turner, 2007); teacher assessment decision-making processes in using rating scales (Brindley, 1998); (3) the role of teacher knowledge, experience, and beliefs in planning and implementing assessments (Brindley, 2001, Gardner & Rea-Dickins, 2001, 2007); (4) the role of diagnostic or dynamic assessment in promoting teaching and learning (Alderson, 2005; Lantoff & Poehner, 2011); (5) the effects of standards-based and outcomes-based assessment on teaching, learning and policy (Davidson, 2007); and (6) the value of self and peer assessment for promoting self-regulation, autonomy, motivation, and learner outcomes (Patri, 2002; Saito, 2008). Aiming to further this discussion, Purpura and Turner (2013, Forthcoming) have proposed an approach to classroom-based assessment that, in prioritizing learning and learning processes, seeks to determine how well students have benefitted from assessment in narrowing achievement gaps. More specifically, this approach describes how planned and unplanned assessments are conceptualized and implemented from a learning perspective, as well as how planned assessments, together with those occurring spontaneously through social interaction, contribute to the advancement of L2 processing and the attainment of learning outcomes. In highlighting the integration of models of cognition, socio-cognition, and learning with L2 instruction and assessment, this approach, referred to as learning-oriented assessment (Purpura, 2004), is concerned with the contextual, cognitive, socio-cognitive, dispositional, and interactional dimensions that underlie the design, implementation, and use of assessments and their potential for facilitating learning. The current study uses a learning-oriented approach to investigate the nature of planned and unplanned assessments in an ESL classroom, and the role that these assessments played in learning the passive voice. The study examined specifically how the use of planned and unplanned assessments promoted learners’ L2 processing, and how the assessments, performed individually or collaboratively through social interaction, contributed (or not) to the ability to the learners’ ability to use the passive voice to describe operational processes (e.g., desalination). Three intermediate ESL classes were videotaped, using three cameras, from the beginning of a lesson to its culmination, when the results of an achievement test were returned (approximately 4 days). The video data were then uploaded onto Nvivo and transcribed. Instances of planned and unplanned assessments were then identified and examined iteratively and recursively through several lens (e.g., interactional features, proficiency features, processing and learning features). Learning patterns in the data were then tracked across the lessons and related to ultimate learning outcomes. The results showed that assessment, whether planned or unplanned, played a pervasive role in the teaching/learning process. The also showed a complex mix of patterns related to how spontaneous assessments may or may not contribute to the achievement of ultimate learning goals.
James E. Purpura and Carolyn E. Turner
Paper presented at the 3rd Roundtable on Learning Oriented-Assessment in Language Classrooms and Large-Scale Contexts, New York, NY; October, 2014