Talk 12: Elena Volodina
Protecting privacy of learner datasets - the case of pseudonymization
Elena Volodina, Simon Dobnik, Ricardo Muñoz Sanchéz and Maria Irena Szawerna (University of Gothenburg)
This talk will be devoted to the challenges of working with data that contains personal information. I will describe a set of experiments with automatic pseudonymization that we have performed on learner essays. Among others, experiments with detection and labeling of personal categories using BERT models (Szawerna et al. 2024, 2025), attempts att using LLMs to "fill in the blanks" when substituting personal information with pseudonyms (yet unpublished) and a study on whether pseudonyms can provoke biased automated classifications (Muñoz Sánchez et al. 2024).
Talk 13: Kate Belcher
Exploring Approaches to Short Answer Grading in Domain-Specific Educational Settings
Kate Belcher, Kordula de Kuthy and Detmar Meurers (Leibniz-Institut für Wissensmedien (IWM))
The integration of open questions into Intelligent Tutoring Systems offers the potential to broaden their use in digital education, enabling learning actives which go beyond the acquisition of foundational theoretical concepts, and instead foster the development of transfer and generalisation abilities. In the case of Economics education, an “ill-defined domain” (Lynch et al. 2006), these higher-order transfer skills are particularly relevant for subject-specific pedagogical goals. Automatic evaluation of such open questions, however, presents challenges both from a methodological perspective, as well as with respect to pedagogical alignment. In this talk, we examine these issues and focus on how the linguistic nature of short-answer data affects automated grading performance. Our findings suggest that performance in both white-box and LLM-grading approaches varies systematically based on question types and the skills they target. We then outline ongoing work, which focuses on how different grading resources can be effectively integrated to improve the reliability and pedagogical alignment of automatic scoring.
Talk 14: Suchir Salhan
Bilingual language models as computational models of human bilingualism and AI-based solutions to support second-language learning
Suchir Salhan (ALTA Institute)
This talk will share ongoing work on interpretable bilingual language modelling. The talk will begin surveying efforts to build computational cognitive models of bilingual and second language learning. I will introduce the Multilingual BabyLM Challenge at EMNLP and how this opens opportunities for second language modelling. Secondly, I will share ongoing work to develop and benchmark bilingual language models across scales as computational models of bilingual and second language acquisition. Finally, I will share future directions for applying these bilingual models for improving the pedagogical alignment of LLMs.
Talk 15: Daniela Verratti Souto
A Peek into Skill Transfer: from structured grammar practice to free production
Daniela Verratti-Souto, Kordula De Kuthy and Detmar Meurers (IWM)
Modern language teaching in school contexts aims to enable learners to communicate appropriately and effectively across a range of situations. Within Skill Acquisition Theory, explicit grammar instruction is assumed to develop declarative knowledge of linguistic forms, which can be proceduralized and increasingly automatized through practice. Controlled, focus-on-form practice is therefore designed to support the progression from accuracy dependent on declarative knowledge to more fluent use. Ideally, the acquired skills could then be applied to less constrained communicative contexts; however, empirical evidence for the transfer of such practice to open, productive tasks remains inconclusive. Drawing on data from FeedBook, an Intelligent Tutoring System developed for learners of English in German school contexts, we investigate the relationship between students’ structured practice of three target constructions—conditionals, comparatives, and relative clauses—and their use of these structures in free writing.
Talk 16: Soroosh Akef
LLM-Powered but Rule-Grounded: Grammatical Error Correction for Learner Model Construction
Soroosh Akef, Amália Mendes, Detmar Meurers and Patrick Rebuschat (IWM)
Identifying learners’ grammatical errors at various stages of language development affords a host of possibilities in intelligent computer-assisted language learning (ICALL) systems, from learner modeling to automatic feedback generation. While grammatical error correction (GEC) is a long-standing task in natural language processing, rarely has the focus of GEC been identifying pedagogically relevant grammatical errors. We propose a hybrid framework to correct grammatical errors using an LLM and subsequently identify pedagogically relevant grammatical properties the errors represent using a rule-based system. Rather than opt for existing reference-based metrics for evaluation, which do not account for the reality that there may be multiple ways to correct an error, we evaluated the validity of the LLM’s corrections and our framework by creating two sets of accuracy features based on the annotator-corrected and the LLM-corrected versions of texts in a Portuguese learner corpus and used each set separately to train two interpretable L2 learner proficiency classifiers, the results of which indicated comparable performance and behavior between models trained using each feature set. The applicability of this approach to learner modeling in real-life settings will be discussed.
Talk 17: Shiva Taslimipoor
Dataset of Responses and Annotations for Reading Comprehension Multiple Choice Questions
Shiva Taslimipoor, Luca Benedetto, Andrew Caines, Can Jin and Yongcan Liu (ALTA Institute, CUP&A, Faculty of Education)
This project aims to collect, curate, and release a high-quality corpus of language learners’ responses to reading comprehension multiple-choice questions (MCQs), alongside annotations identifying the specific text segments required to answer each question. We plan to use the collected corpus to develop techniques for evaluating question quality and to identify the features that constitute effective stems and distractors. Furthermore, we investigate the relationship between learners’ response patterns and their textual attention, accounting for question types and varying levels of English proficiency. The findings will offer insights into the information required for question answering, into question difficulty, and can aid in the development of AI agents designed to simulate student behaviour. This is an ongoing work and this talk details our methodology, study design, and current data collection efforts.