Workshop Program
VarDial takes place on Thursday, June 20th. All times are in local Mexico City time (GMT-6).
9:00–9:10 Opening Remarks
9:10–9:40 VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification – Adrian-Gabriel Chifu, Goran Glavaš, Radu Tudor Ionescu, Nikola Ljubešić, Aleksandra Miletić, Filip Miletić, Yves Scherrer and Ivan Vulić
9:40–10:00 Data-Augmentation-Based Dialectal Adaptation for LLMs – Fahim Faisal and Antonios Anastasopoulos
10:00–10:20 Brandeis at VarDial 2024 DSL-ML Shared Task: Multilingual Models, Simple Baselines and Data Augmentation – Jonne Sälevä and Chester Palen-Michel
10:20–10:30 Poster Boosters I (2 minutes per poster)
Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings – The Quyen Ngo, Thi Anh Phuong Nguyen, My Linh Ha, Thi Minh Huyen Nguyen and Phuong Le-Hong
One-Shot Prompt for Language Variety Identification – Nat Gillin
Incorporating Dialect Understanding Into LLM Using RAG and Prompt Engineering Techniques for Causal Commonsense Reasoning – Benedikt Perak, Slobodan Beliga and Ana Meštrović
JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far – Nikola Ljubešić, Taja Kuzman, Peter Rupnik, Ivan Vulić, Fabian David Schmidt and Goran Glavaš
10:30–11:00 Coffee Break
11:00–12:00 Invited Talk by Diyi Yang (Stanford University): Adapting LLMs to Low-Resource English Dialects
Large language models (LLMs) based primarily on Standard American English (SAE) often perform significantly worse when applied to other English dialects. Given English's place as a global contact language, NLU systems designed for all English speakers, rather than a limited sub-population, are essential for widening access to high-quality language technologies. In this talk, we take a participatory design approach to develop dialect-inclusive language resources, then introduce efficient adaptation algorithms to adapt existing LLMs trained on Standard American English to a wide range of English dialects, towards more inclusive and lightweight NLU systems. Finally, we discuss how efficient adaptation of LLMs can help facilitate positive change by addressing language and communication barriers.
12:00–12:30 Poster Boosters II (2 minutes per poster)
How Well Do Tweets Represent Sub-Dialects of Egyptian Arabic? – Mai Mohamed Eida, Mayar Nassar and Jonathan Dunn
Language Identification of Philippine Creole Spanish: Discriminating Chavacano From Related Languages – Aileen Joan Vicente and Charibeth Cheng
Multilingual Identification of English Code-Switching – Igor Sterner
Does Whisper Understand Swiss German? An Automatic, Qualitative, and Human Evaluation – Eyal Liron Dolev, Clemens Fidel Lutz and Noëmi Aepli
Modeling Orthographic Variation in Occitan's Dialects – Zachary William Hopton and Noëmi Aepli
Highly Granular Dialect Normalization and Phonological Dialect Translation for Limburgish – Andreas Simons, Stefano De Pascale and Karlien Franco
The Role of Adverbs in Language Variety Identification: The Case of Portuguese Multi-Word Adverbs – Izabela Müller, Nuno Mamede and Jorge Baptista
Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization – Olubusayo Olabisi and Ameeta Agrawal
What Drives Performance in Multilingual Language Models? – Sina Bagheri Nezhad and Ameeta Agrawal
NoMusic - The Norwegian Multi-Dialectal Slot and Intent Detection Corpus – Petter Mæhlum and Yves Scherrer
DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects – Nikola Ljubešić, Nada Galant, Sonja Benčina, Jaka Čibej, Stefan Milosavljević, Peter Rupnik and Taja Kuzman
Can LLMs Handle Low-Resource Dialects? A Case Study on Translation and Common Sense Reasoning in Šariš – Viktória Ondrejová and Marek Šuppa
12:30–14:00 Lunch Break
14:00–15:30 Poster Session
15:30–16:00 Coffee Break
16:00–16:25 When Elote, Choclo and Mazorca are not the Same. Isomorphism-Based Perspective to the Spanish Varieties Divergences – Cristina España-Bonet, Ankur Bhatt, Koel Dutta Chowdhury and Alberto Barrón-Cedeño
16:25–16:50 Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text – Stella Verkijk, Pia Sommerauer and Piek T.J.M. Vossen
16:50–17:15 Experiments in Multi-Variant Natural Language Processing for Nahuatl – Robert Pugh and Francis Tyers
17:15–17:30 Closing Remarks