Program
Program
Workshop Program
Sunday, December 13, 2020 - From 2pm to 7pm CET
14:00–14:30- Opening Remarks and Shared Task Report
14:05–14:25 - A REPORT ON THE VARDIAL EVALUATION CAMPAIGN 2020 - Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešic, Niko Partanen, Christoph Purschke, Yves Scherrer and Marcos Zampieri
14:30–15:30 - Invited Talk by Barbara Plank
TACKLING THE LONG TAIL IN NLP: TRANSFER TO LOW_RESOURCE LANGUAGES, VARIETIES AND DIALECTS
Abstract: The lack of publicly available training and evaluation data for low-resource languages, varieties and dialects hampers progress in Natural Language Processing (NLP), leaving the majority of the world's languages, varieties and speakers behind. As key NLP tasks such as part-of-speech tagging, named entity recognition or intent and slot filling in task-oriented dialogue systems require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. Transfer learning (TL) and multi-task learning (MTL) can help remedy this problem. In this talk, I will discuss TL and MTL methods to tackle this challenge and present some of our (on-going) work on NLP for low-resource languages, including Danish and a case study on a very low-resource dialect.
16:00 - 17:00 - Oral presentations
16:00–16:15 - ASR FOR NON-STANDARDISED LANGUAGES WITH DIALECTAL VARIATION: THE CASE OF SWISS GERMAN - Iuliia Nigmatulina, Tannon Kew and Tanja Samardzic
16:15–16:30 - LSDC - A COMPREHENSIVE DATASET FOR LOW SAXON DIALECT CLASSIFICATION - Janine Siewert, Yves Scherrer, Martijn Wieling and Jörg Tiedemann
16:30–16:45 - MACHINE-ORIENTED NMT ADAPTATION FOR ZERO-SHOT NLP TASKS: COMPARING THE USEFULNESS OF CLOSE AND DISTANT LANGUAGES - Amirhossein Tebbifakhr, Matteo Negri and Marco Turchi
16:45–17:00 - CHARACTER ALIGNMENT IN MORPHOLOGICALLY COMPLEX TRANSLATION SETS FOR RELATED LANGUAGES - Michael Gasser, Binyam Ephrem Seyoum and Nazareth Amlesom Kifle
17:30-18:30 - Poster presentations (list below)
18:30–19:00 - Discussion and Closing
Poster Presentations
A DUAL-ENCODING SYSTEM FOR DIALECT CLASSIFICATION
Petru Rebeja and Dan Cristea
A FOUR-DIALECT TREEBANK FOR OCCITAN: BUILDING PROCESS AND PARSING EXPERIMENTS
Aleksandra Miletic, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamença Poujade and Jean Sibille
A TOKENIZATION SYSTEM FOR THE KURDISH LANGUAGE
Sina Ahmadi
APPLYING MULTILINGUAL AND MONOLINGUAL TRANSFORMER-BASED MODELS FOR DIALECT IDENTIFICATION
Cristian Popa and Vlad Ștefănescu
BILINGUAL LEXICON INDUCTION ACROSS ORTHOGRAPHICALLY-DISTINCT UNDER-RESOURCED DRAVIDIAN LANGUAGES
Bharathi Raja Chakravarthi, Navaneethan Rajasekaran, Mihael Arcan, Kevin McGuinness, Noel E. O'Connor and John P. McCrae
BUILDING A CORPUS FOR THE ZAZA–GORANI LANGUAGE FAMILY
Sina Ahmadi
CHALLENGES IN NEURAL LANGUAGE IDENTIFICATION: NRC AT VARDIAL 2020
Gabriel Bernier-Colborne and Cyril Goutte
COMBINING DEEP LEARNING AND STRING KERNELS FOR THE LOCALIZATION OF SWISS GERMAN TWEETS
Mihaela Gaman and Radu Tudor Ionescu
DEALING WITH DIALECTAL VARIATION IN THE CONSTRUCTION OF THE BASQUE HISTORICAL CORPUS
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano and Ander Soraluze
DIALECT IDENTIFICATION UNDER DOMAIN SHIFT: EXPERIMENTS WITH DISCRIMINATING ROMANIAN AND MOLDAVIAN
Çağrı Çöltekin
DISCRIMINATING BETWEEN STANDARD ROMANIAN AND MOLDAVIAN TWEETS USING FILTERED CHARACTER NGRAMS
Andrea Ceolin and Hong Zhang
EXPERIMENTS IN LANGUAGE VARIETY GEOLOCATION AND DIALECT IDENTIFICATION
Tommi Jauhiainen, Heidi Jauhiainen and Krister Lindén
EXPLORING THE POWER OF ROMANIAN BERT FOR DIALECT IDENTIFICATION
George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel and Traian Rebedea
GEOLOCATION OF TWEETS WITH A BILSTM REGRESSION MODEL
Piyush Mishra
HELJU@VARDIAL 2020: SOCIAL MEDIA VARIETY GEOLOCATION WITH BERT MODELS
Yves Scherrer and Nikola Ljubešić
NEURAL MACHINE TRANSLATION FOR TRANSLATING INTO CROATIAN AND SERBIAN
Maja Popović, Alberto Poncelas, Marija Brkic and Andy Way
RECYCLING AND COMPARING MORPHOLOGICAL ANNOTATION MODELS FOR ARMENIAN DIACHRONIC-VARIATIONAL CORPUS PROCESSING
Chahan Vidal-Gorène, Victoria Khurshudyan and Anaïd Donabédian-Demopoulos
REDISCOVERING THE SLAVIC CONTINUUM IN REPRESENTATIONS EMERGING FROM NEURAL MODELS OF SPOKEN LANGUAGE IDENTIFICATION
Badr M. Abdullah, Jacek Kudera, Tania Avgustinova, Bernd Möbius and Dietrich Klakow
TOWARDS AUGMENTING LEXICAL RESOURCES FOR SLANG AND AFRICAN AMERICAN ENGLISH
Alyssa Hwang, William R. Frey and Kathleen McKeown
URALIC LANGUAGE IDENTIFICATION (ULI) 2020 SHARED TASK DATASET AND THE WANCA 2017 CORPORA
Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen and Krister Lindén
VULGARIS: ANALYSIS OF A CORPUS FOR MIDDLE-AGE VARIETIES OF ITALIAN LANGUAGE
Andrea Zugarini, Matteo Tiezzi and Marco Maggini
ZHAW-INIT - SOCIAL MEDIA GEOLOCATION AT VARDIAL 2020
Fernando Benites, Manuela Hürlimann, Pius von Däniken and Mark Cieliebak