GSCL Shared Task: Automatic Linguistic Annotation of Computer-Mediated Communication / Social Media

Welcome to the website of the EmpiriST 2015 shared task on automatic linguistic annotation of computer-mediated communication / social media.

Latest news:

The goal of this shared task is to encourage the developers of NLP applications to adapt their tools and resources for the processing of written German discourse in genres of computer-mediated communication (CMC). Examples for CMC genres are chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues.

Processing CMC discourse is a desideratum and a relevant task in different research fields and application contexts in the Digital Humanities - e.g.:

The shared task consists of two subtasks:

The two subtasks will have to be handled for two different data sets:

We will provide training data sets which have been manually tokenized and tagged on the basis of detailed annotation guidelines.

Before the release of the full task we will publish a small set of trial data which may be used by developers. Annotation guidelines which have been used for annotating the trial and training data are available, too.

The shared task (ST) has been prepared by members of the DFG scientific network Empirikom (therefore: "EmpiriST"). Its preparation has been funded by the German Society for Language Technology and Computational Linguistics (GSCL).

The shared task is endorsed by the ACL Special Interest Group on the Web as Corpus and by the GSCL Special Interest Group on Social Media / Computer-Mediated Communication.