Talk 6: Stefano Bannò
Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
Stefano Bannò, Kate Knill and Mark Gales (ALTA Institute, Department of Engineering)
Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP). The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level. We evaluate the ability of LLMs to assign proficiency levels to individual words as they appear in L2 learner writing, addressing key challenges such as polysemy, contextual variation, and multi-word expressions. We compare LLMs to a PoS-based baseline. LLMs appear to exploit additional semantic information that yields improved performance.We also explore correlations between word-level proficiency and essay-level proficiency. Finally, the approach is applied to examine the consistency of the EVP proficiency levels. Results show that LLMs are well-suited for the task of vocabulary assessment.
Talk 7: Joni Kruijsbergen
‘ChatGPT can write it for me’: AI in the L2 Dutch learning context
Joni Kruijsbergen, Chloé Lybaert and Orphée De Clercq (Ghent University)
Being able to write well is a fundamental skill for full participation in society, including for second‑language learners (De Smet et al., 2012). While the writing process relies on receiving detailed and constructive feedback (Shintani & Ellis, 2013), manually providing such feedback is time‑consuming (Godwin‑Jones, 2022). Consequently, many second‑language learners and their instructors increasingly turn to technological tools (Godwin‑Jones, 2024). They often perceive generative AI as the solution, given that chatbots excel at (re)writing. However, these chatbots are not trained to provide pedagogically sound feedback. Additionally, research into the impact of generative AI on learning and writing processes is still limited, and further work is needed to increase the transparency of such tools.
Our research addresses precisely those issues, focusing specifically on the context of teaching Dutch as a second language (L2 Dutch). During the presentation, we will show the broader context of a recently started doctoral project on this topic. Furthermore, we will elaborate on the initial step in the research: examining how L2 Dutch teachers perceive, use, and promote AI, particularly in relation to writing skills. Teachers’ experiences namely play a crucial role in the responsible pedagogical integration of (AI) tools (Ertmer et al., 2012; Akram et al., 2022). In this first study, we investigated (1) AI use and encouragement in L2 Dutch classrooms and (2) attitudes toward AI, with a survey among 101 L2 Dutch teachers. We link the results to the broader L2 writing context and discuss the question shift of ‘what can we use AI for’ to ‘what do we want to use AI for’.
Talk 8: Matthew Pattemore
Evaluating the Impact of Corrective Feedback Types in Children’s Digital Language Games
Matthew Pattemore (University of Tübingen)
Digital corrective feedback is distinct from in-person feedback, as it must be explicitly discernible and has more limited space for negotiation of meaning between tutor and learner. Additionally, for children, whose metalinguistic awareness may be less well-developed, it is unclear whether corrective feedback on the process of reaching a correct answer (e.g. metalinguistic explanations) is more or less effective than simply providing outcome feedback (e.g. correct/incorrect), particularly when the learner’s attention is otherwise engaged in gameplay. This talk presents data from 732 11-year-old Spanish learners of English receiving 108168 pieces of corrective feedback following errors while playing a literacy development game, to show what types of corrective feedback are better taken up in this context, and by which types of learners. These data help focus feedback development on what learners actually engage with, improving the efficiency of both human‑designed and genAI‑driven systems.
Talk 9: Luisa Ribeiro Flucht
Giving GenAI a Pedagogical Brain: Domain Modeling for LLM-Mediated Grammar Practice
Luisa RIbeiro-Flucht, Detmar Meurers and Xiaobin Chen (University of Tübingen)
Large Language Models (LLMs) enable the dynamic generation of contextualized grammar practice, yet without explicit domain models, such systems risk pedagogical misalignment. This talk presents ongoing work on a graph-based domain model designed to provide LLM-mediated, communicative grammar practice. Grounded in the English Grammar Profile, the model is intended to serve as an interpretable backbone for adaptive task allocation that aligns linguistic targets with their communicative purposes.
Drawing on results from a three-week pilot study, I outline our ongoing efforts to model learner state through a graph-based overlay. The broader aim is to move toward LLM-mediated instruction that is not only generative, but functionally grounded, supporting practice conditions that resemble the communicative contexts in which grammatical knowledge is ultimately deployed.
Talk 10: Øistein Andersen and Geraldine Mark
Developing the English Grammar Profile Tagger: A Tool for Grammatical Annotation and CEFR-Level Mapping
Øistein Andersen, Geraldine Mark, Anne O'Keeffe, Andrew Caines, Diane Nicholls, and Paula Buttery (ALTA Institute and the University of Limerick)
The English Grammar Profile (EGP) is an online, database of over 1,200 competencies describing the grammatical structures and functions that L2 learners of English use at each level (A1–C2) of the Common European Framework of Reference for Languages (CEFR). The EGP was derived from analysis of the Cambridge Learner Corpus (O’Keeffe & Mark, 2017), a 30-million-word corpus of learner writing in exam tasks, representing over 130 first languages across 190 countries worldwide. In this paper, we outline work in progress on the development of the ‘English Grammar Profile Tagger’, which enables automatic identification of the grammatical structures and functions (classified by part of speech and CEFR level) described in the EGP. We examine both the progress and the limitations of operationalising the EGP descriptors into an automated system, highlighting challenges of reconciling a profile that was originally designed for language teachers, learners and materials developers within the constraints of computational processing. The tagger has been developed as new functionality within the Robust Accurate Parsing System (RASP), an open-source natural language processing toolkit (Briscoe et al., 2006). Identification of EGP constructions within input texts is automatic and based on lexico‑syntactic information including word forms, part-of-speech tags, and grammatical relations. Development has proceeded through iterative inspection of outputs, targeted error correction, and the addition of new rules aimed at achieving fuller coverage of the EGP. We are currently preparing an evaluation set of non-exam essays tagged with reference to EGP constructions by human experts. This will provide a basis for measuring the accuracy of the EGP tagger and further refining its rule-set. In this session we’ll discuss observations from this experience of evaluating different types of learner writing, as well as avenues for future exploration.
References
Briscoe, T., Carroll, J., and Watson, R. 2006. The Second Release of the RASP System. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions. Association for Computational Linguistics.
O’Keeffe, A., and Mark, G. 2017. The English Grammar Profile of learner competence: Methodology and key findings. International Journal of Corpus Linguistics 22(4): 457-489.
Talk 11: Detmar Meurers
Linguistic Complexity in Longitudinal and Cross-sectional Perspectives
Detmar Meurers (University of Tübingen) and Yushan Li ( Zhejiang University)
Linguistic complexity analyses are commonly used both to characterize writing quality and the longitudinal development of language learners- but is this a valid use of those characteristics? I this talk, we investigate what characterizes the development of German-as-a-Foreign-Language learners and what is indicative of their successful exam performance based on two large corpora of Chinese learners of German (CDLK, Li & Wu 2023 and PGG, Li 2025). Starting with a rich set of 450 linguistic complexity features for German (Weiss & Meurers, 2019) that capture all domains of linguistic modeling, we study which features are indicative of development and which for successful exams. We discuss insights at different levels of granularity, from feature selection and quantitative classification results using Explainable Boosting Machines to qualitative linguistic interpretation.