Program

Workshop Program - Tuesday, April 20, 2021 - All times CEST (ZURICH TIME)

10:30–10:40 - Opening Session

10:40–11:00 - Findings of the VarDial Evaluation Campaign 2021- Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer and Marcos Zampieri

11:00–11:30 - Coffee break

11:30–12:30 - Invited Talk by Jack Grieve - VarDial-ectology: what dialectology can contribute to NLP

Abstract: In this talk, I consider how insights from linguistics, especially dialectology, can inform research in NLP concerned with automatically classifying texts based on regional and national dialects. In particular, I discuss how the types of features and varieties of language examined in dialectology could be meaningfully integrated into NLP models and why this is important. I will also consider how working with very large and highly stratified corpora, as has become common in computational sociolinguistics in recent years, can help further extend applied research in this area of NLP. Finally, I will discuss the applicability of automated techniques for dialect identification in a forensic context, where analysts must often work with texts that are especially difficult to profile, and where drawing on insights from linguistics can therefore be especially important.

12:30-13:30 - Oral Presentations

  • 12:30–12:45 - Hierarchical Transformer for Multilingual Machine Translation - Albina Khusainova, Adil Khan, Adín Ramírez Rivera and Vitaly Romanov

  • 12:45–13:00 - Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch - Diego Frassinelli, Gabriella Lapesa, Reem Alatrash, Dominik Schlechtweg and Sabine Schulte im Walde

  • 13:00–13:15 - Representations of Language Varieties Are Reliable Given Corpus Similarity Measures - Jonathan Dunn

  • 13:15–13:30 - Whit’s the Richt Pairt o Speech: PoS tagging for Scots - Harm Lameris and Sara Stymne

13:30–15:00 - Lunch break

15:00–16:00 - Poster Session (list of papers)

16:00–17:00 - Invited Talk by Katharina Kann - Towards Natural Language Processing Systems for All Languages and Tasks

Abstract: Natural language processing (NLP) plays an increasingly important role in everyday life, and many people are familiar with products such as Google Translate, Alexa or Siri. However, NLP systems currently only exist for a small fraction of the world's approximately 7000 languages. This is undesirable for many reasons: For instance, only speakers of high-resource languages are able to benefit from the abundance of information available on the internet, which reinforces already existing inequalities. It also limits the ability of NLP to support language documentation and revitalization efforts. In my talk, I will present NLP research focused explicitly on low-resource languages. I will first talk about how we can leverage NLP systems to speed up language documentation efforts. I will then discuss how existing pretrained multilingual models can be adapted to truly low-resource languages.

17:00–17:15 - Closing Remarks


Poster Session Papers

  • Efficient Unsupervised NMT for Related Languages with Cross-Lingual Language Models and Fidelity Objectives - Rami Aly, Andrew Caines and Paula Buttery

  • Fine-tuning Distributional Semantic Models for Closely-Related Languages - Kushagra Bhatia, Divyanshu Aggarwal and Ashwini Vaidya

  • Discriminating Between Similar Nordic Languages - René Haas and Leon Derczynski

  • Naive Bayes-based Experiments in Romanian Dialect Identification- Tommi Jauhiainen, Heidi Jauhiainen and Krister Lindén

  • UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning - Mihaela Gaman, Sebastian Cojocariu and Radu Tudor Ionescu

  • Optimizing a Supervised Classifier for a Difficult Language Identification Problem - Yves Bestgen

  • Comparing the Performance of CNNs and Shallow Models for Language Identification - Andrea Ceolin

  • Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT - George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel and Traian Rebedea

  • Comparing Approaches to Dravidian Language Identification - Tommi Jauhiainen, Tharindu Ranasinghe and Marcos Zampieri

  • N-gram and Neural Models for Uralic Language Identification: NRC at VarDial 2021 - Gabriel Bernier-Colborne, Serge Leger and Cyril Goutte

  • Social Media Variety Geolocation with geoBERT - Yves Scherrer and Nikola Ljubešić