We invite you to participate in the 2022 shared task on Text Normalization for Swiss German which will be held at SwissText 2022.
Written Swiss German is not standardized and varies across authors and their dialects and its use is almost exclusively constrained to communication on social media or via text messaging. Many corpora will therefore contain many distinct surface forms for the same word which can make their analysis challenging. It is therefore desirable to be able to normalize them to a single common surface form.
We collected Swiss German utterances from social media and two annotators mapped every token to a corresponding form in Standard German (see examples below). The task is to build models that can perform such a mapping automatically. This is different from translation since the resulting normalized utterance will in general not be grammatically correct Standard German as word order is preserved.
A similar effort has previously been undertaken for text messages by the SMS4Science project. There is also a recent related shared task on lexical normalization of other languages at WNUT2021 workshop.
Please fill out the registration form to participate. You will receive the data only after registration.
We provide code for checking and scoring your submissions in this repository.
February, 2022
May 16, 2022
May 20, 2022
May 24, 2022
June 03, 2022
June 08, 2022
June 17, 2022
July, 2022
Shared Task Start
Release Test Data
System outputs due
Publication Evaluation Results
System Descriptions Due
Workshop at Swisstext 2022
Author notifications / reviews
Camera-ready system descriptions
Contact: vode@zhaw.ch
Organization:
Pius von Däniken, ZHAW CAI
Manuela Hürlimann, ZHAW CAI
Mark Cieliebak, ZHAW CAI