Workshop Program
The workshop takes place on Friday, May 5th. All times are Central European Summer Time (CEST).
9:00–9:10 Opening Remarks
9:10–9:40 Findings of the VarDial 2023 Evaluation Campaign - Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer and Marcos Zampieri
9:40–10:05 Two-stage Pipeline for Multilingual Dialect Detection - Ankit Vaidya and Aditya Kane
10:05–10:30 Fine-Tuning BERT with Character-Level Noise for Zero-Shot Transfer to Dialects and Closely-Related Languages - Aarohi Srivastava and David Chiang
10:30–11:00 Coffee Break
11:00–11:25 Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection - Galo Castillo-López, Arij Riabi and Djamé Seddah
11:25–11:50 Optimizing the Size of Subword Vocabularies in Dialect Classification - Vani Kanjirangat, Tanja Samardžić, Ljiljana Dolamic and Fabio Rinaldi
11:50–12:15 Comparing and Predicting Eye-tracking Data of Mandarin and Cantonese - Junlin Li, Bo Peng, Yu-Yin Hsu and Emmanuele Chersoni
12:15–12:40 Poster Boosters I
[EACL Findings] Exploring Enhanced Code-Switched Noising for Pretraining in Neural Machine Translation - Vivek Iyer, Arturo Oncevay and Alexandra Birch
[EACL Findings] Spelling Convention Sensitivity in Neural Language Models - Elizabeth Nielsen, Christo Kirov and Brian Roark
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages - Verena Blaschke, Hinrich Schütze and Barbara Plank
Temporal Domain Adaptation for Historical Irish - Oksana Dereza, Theodorus Fransen and John P. McCrae
Reconstructing Language History by Using a Phonological Ontology. An Analysis of German Surnames - Hanna Fischer and Robert Engsterhold
A Measure for Linguistic Coherence in Spatial Language Variation - Alfred Lameli and Andreas Schönberg
Variation and Instability in Dialect-Based Embedding Spaces - Jonathan Dunn
Lemmatization Experiments on Two Low-Resourced Languages: Low Saxon and Occitan - Aleksandra Miletić and Janine Siewert
The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group - Ilia Afanasev
Dialect Representation Learning with Neural Dialect-to-Standard Normalization - Olli Kuparinen and Yves Scherrer
12:40–14:00 Lunch Break
14:00–14:50 Invited Talk by Ivan Vulić (University of Cambridge): Bridging the Dialect Gap with Modular Transfer Learning?
14:50–15:40 Round Table: VarDial in the Era of Large Language Models
Panelists: Antonios Anastasopoulos, Gabriel Bernier-Colborne, Preslav Nakov, Tanja Samardzić, Ivan Vulić
Moderator: Yves Scherrer
15:40–16:15 Coffee Break
16:15–16:40 Poster Boosters II
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora - Taja Kuzman, Peter Rupnik and Nikola Ljubešić
PALI: A Language Identification Benchmark for Perso-Arabic Scripts - Sina Ahmadi, Milind Agarwal and Antonios Anastasopoulos
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian - Peter Rupnik, Taja Kuzman and Nikola Ljubešić
Dialect and Variant Identification as a Multi-Label Classification Task: A Proposal Based on Near-Duplicate Analysis - Gabriel Bernier-Colborne, Cyril Goutte and Serge Leger
Murreviikko: A Dialectologically Annotated and Normalized Dataset of Finnish Tweets - Olli Kuparinen
DiatopIt: A Corpus of Social Media Posts for the Study of Diatopic Language Variation in Italy - Alan Ramponi and Camilla Casula
VarDial in the Wild: Industrial Applications of LID Systems for Closely-Related Language Varieties - Fritz Hohl and Soh-Eun Shim
Using Ensemble Learning in Language Variety Identification - Mihaela Gaman
SIDLR: Slot and Intent Detection Models for Low-Resource Language Varieties - Sang Yun Kwon, Gagan Bhatia, ElMoatez Billah Nagoudi, Alcides Alcoba Inciarte and Muhammad Abdul-Mageed
16:40–18:00 Poster Session