Workshop Program

VarDial takes place on Thursday, June 20th. All times are in local Mexico City time (GMT-6).

9:00–9:10 Opening Remarks

9:10–9:40 VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification – Adrian-Gabriel Chifu, Goran Glavaš, Radu Tudor Ionescu, Nikola Ljubešić, Aleksandra Miletić, Filip Miletić, Yves Scherrer and Ivan Vulić

9:40–10:00 Data-Augmentation-Based Dialectal Adaptation for LLMs – Fahim Faisal and Antonios Anastasopoulos

10:00–10:20 Brandeis at VarDial 2024 DSL-ML Shared Task: Multilingual Models, Simple Baselines and Data Augmentation – Jonne Sälevä and Chester Palen-Michel

10:20–10:30 Poster Boosters I (2 minutes per poster)

Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings – The Quyen Ngo, Thi Anh Phuong Nguyen, My Linh Ha, Thi Minh Huyen Nguyen and Phuong Le-Hong
One-Shot Prompt for Language Variety Identification – Nat Gillin
Incorporating Dialect Understanding Into LLM Using RAG and Prompt Engineering Techniques for Causal Commonsense Reasoning – Benedikt Perak, Slobodan Beliga and Ana Meštrović
JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far – Nikola Ljubešić, Taja Kuzman, Peter Rupnik, Ivan Vulić, Fabian David Schmidt and Goran Glavaš

10:30–11:00 Coffee Break

11:00–12:00 Invited Talk by Diyi Yang (Stanford University): Adapting LLMs to Low-Resource English Dialects

Large language models (LLMs) based primarily on Standard American English (SAE) often perform significantly worse when applied to other English dialects. Given English's place as a global contact language, NLU systems designed for all English speakers, rather than a limited sub-population, are essential for widening access to high-quality language technologies. In this talk, we take a participatory design approach to develop dialect-inclusive language resources, then introduce efficient adaptation algorithms to adapt existing LLMs trained on Standard American English to a wide range of English dialects, towards more inclusive and lightweight NLU systems. Finally, we discuss how efficient adaptation of LLMs can help facilitate positive change by addressing language and communication barriers.

12:00–12:30 Poster Boosters II (2 minutes per poster)

How Well Do Tweets Represent Sub-Dialects of Egyptian Arabic? – Mai Mohamed Eida, Mayar Nassar and Jonathan Dunn
Language Identification of Philippine Creole Spanish: Discriminating Chavacano From Related Languages – Aileen Joan Vicente and Charibeth Cheng
Multilingual Identification of English Code-Switching – Igor Sterner
Does Whisper Understand Swiss German? An Automatic, Qualitative, and Human Evaluation – Eyal Liron Dolev, Clemens Fidel Lutz and Noëmi Aepli
Modeling Orthographic Variation in Occitan's Dialects – Zachary William Hopton and Noëmi Aepli
Highly Granular Dialect Normalization and Phonological Dialect Translation for Limburgish – Andreas Simons, Stefano De Pascale and Karlien Franco
The Role of Adverbs in Language Variety Identification: The Case of Portuguese Multi-Word Adverbs – Izabela Müller, Nuno Mamede and Jorge Baptista
Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization – Olubusayo Olabisi and Ameeta Agrawal
What Drives Performance in Multilingual Language Models? – Sina Bagheri Nezhad and Ameeta Agrawal
NoMusic - The Norwegian Multi-Dialectal Slot and Intent Detection Corpus – Petter Mæhlum and Yves Scherrer
DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects – Nikola Ljubešić, Nada Galant, Sonja Benčina, Jaka Čibej, Stefan Milosavljević, Peter Rupnik and Taja Kuzman
Can LLMs Handle Low-Resource Dialects? A Case Study on Translation and Common Sense Reasoning in Šariš – Viktória Ondrejová and Marek Šuppa

12:30–14:00 Lunch Break

14:00–15:30 Poster Session

15:30–16:00 Coffee Break

16:00–16:25 When Elote, Choclo and Mazorca are not the Same. Isomorphism-Based Perspective to the Spanish Varieties Divergences – Cristina España-Bonet, Ankur Bhatt, Koel Dutta Chowdhury and Alberto Barrón-Cedeño

16:25–16:50 Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text – Stella Verkijk, Pia Sommerauer and Piek T.J.M. Vossen

16:50–17:15 Experiments in Multi-Variant Natural Language Processing for Nahuatl – Robert Pugh and Francis Tyers

17:15–17:30 Closing Remarks

Page updated

Google Sites

Report abuse