Call for Papers

Pre-conference workshop at KONVENS2014


October 7, 2014.

NLP 4 CMC: Natural Language Processing for Computer-Mediated Communication / Social Media

The workshop is organized by the special interest group "Social Media / Computer-Mediated Communication" of the German Society for Computational Linguistics & Language Technology (GSCL) (

To attend the workshop, please register for KONVENS 2014 main conference.


Over the past decade, there has been a growing interest in collecting, processing and analyzing data from genres of social media and computer-mediated communication (CMC): As part of large corpora which have been automatically crawled from the WWW, CMC data are often regarded as an unloved “bycatch” which is difficult to handle with NLP tools that have been optimized for processing edited text; on the other hand, these data are important parts of web corpora for all research and application contexts which require data sets that represent the diversity of genres and linguistic variation on the web. For corpus-based variational linguistics, CMC corpora are an important resource for closing the "CMC gap" both in corpora of contemporary written language and in corpora of spoken language: Since CMC and social media make up an important part of everyday communication, investigations into language change and linguistic variation need to be able to include CMC and social media data into their empirical analyses.

Nevertheless, the development of approaches and tools for processing the linguistic and structural peculiarities of CMC genres and for building CMC corpora is lacking behind the interest of dealing with these types of data in the field of language technology, corpus-based linguistics and web mining.

The goal of this workshop is to provide a platform for the presentation of results and ongoing work in adapting NLP tools for processing CMC / social media data. The focus of the workshop is on German data, but submissions on NLP approaches, annotation experiments etc. for data of other European languages are also welcome as long as they can make a significant contribution to the further development of the processing of CMC phenomena.


We encourage the submission of long and short research and demo papers including, but not restricted to the following topics related to social media / CMC

    • Corpora and lexical semantic resources for the analysis of social media / computer-mediated communication
    • Normalization (spelling correction, ...)
    • Automatic preprocessing (tokenization, POS tagging, lemmatization, parsing, word sense disambiguation)
    • Annotation of linguistic and structural features in social media / CMC data (annotation schemas, annotation experiments, ...)
    • Domain adaptation
    • Automatic methods in corpus-based CMC / social media analysis (sentiment, summarization, trend detection, ...)
    • Big-data social media analysis


    • Submissions due: 15 July 2014
    • Notification: 22 August 2014
    • Camera-ready papers (revised versions) due: 22 September 2014
    • Workshop: 07 October 2014


Submissions should include the names and addresses of all authors and meet the following requirements:

    • Full Papers (8 pages)
    • Short Papers (2-4 pages) or Extended Abstracts (500-1.000 words): position papers or work in progress
    • Demonstrations (2-4 pages): presentation of systems or prototypes
    • Submissions need to be made in English and should be in PDF format
    • Submissions need to follow the KONVENS format (

Submissions will be accepted via the Easychair system: