Call for papers: CMCCORPORA 2019 @ Paris Seine University

posted Feb 26, 2019, 8:09 AM by Michael Beißwenger   [ updated Feb 26, 2019, 8:11 AM ]

The 7th conference on CMC and Social Media Corpora will be held at Paris Seine University, Sergy-Pontoise, France, on 9-10 September 2019.

The call for papers invites submissions for talks and posters on corpus-based linguistic analysis of computer-mediated communication (CMC), on the development of CMC corpora, on natiural language processing of CMC and on applications of CMC corpora and data beyond linguistics.

CfP and further details: (submission deadline: 1 May 2019)

CfP/extended submission deadline: 6th CMC and Social Media Conference, U Antwerp, 17/18 September 2018

posted May 3, 2018, 10:22 PM by Michael Beißwenger

Call for Papers (extended deadline: 13th May, 2018)
6th Conference on Computer-Mediated Communication (CMC) and Social Media Corpora
17th - 18th September 2018
University of Antwerp, city campus

The 6th conference on CMC and Social Media Corpora (short name: CMC-corpora2018) will be held in Antwerp, Belgium on 17-18 September 2018 and will focus on the collection, analysis and processing of mono- and multimodal, synchronous and asynchronous communication. The focus will encompass different genres of computer-mediated communication (CMC). These include, but are not limited to, discussion forums, blogs, newsgroups, emails, SMS and WhatsApp, text chats, wiki discussions, social network exchanges (such as Facebook, Twitter, Linkedin), discussions in multimodal and/or 3D environments (virtual worlds, gaming worlds).

The conference will bring together researchers who are interested in the collection, organization, processing, analysis and sharing of CMC data for research purposes. We invite submissions on corpus analysis of various types of CMC data for linguistic or applied linguistic purposes and Natural Language Processing.

The conference is hosted by the CLiPS research center of the University of Antwerp (


1. Development of CMC corpora

          - Building CMC corpora: from data collection to publication
          - Open data for research on CMC: questions of ethics and rights
          - Annotation of CMC genres: representation of CMC genres, annotation of linguistic phenomena, metadata
          - Multimodal corpora
          - …

2. Analysis of CMC corpora

          - Sociolinguistic studies of CMC
          - Discourse analysis of CMC
          - Linguistic characteristics of CMC
          - Multimodal aspects of CMC
          - Language in contact and code-switching in CMC
          - CMC in language learning & teaching
          - …

3. Natural Language Processing (NLP) of CMC

          - Normalization
          - PoS Tagging
          - Lemmatization
          - Syntactic parsing
          - Named-entity recognition
          - …


- 13th May: submission deadline (EXTENDED)
- 20th June: notification of acceptance
- 15th August: submission of camera-ready version
- 17th & 18th September: conference


We invite submissions for talks and for posters or software/corpus demonstrations on any topic relevant to the list of themes (below). Contributions should be anonymized and submitted via the online conference system, and will be peer-reviewed by the scientific committee.

For talks, we request short papers (2-4 pages) in English, following the template which you can download here for MSWord (40 kB) or here for LaTeX (260 kB). Authors of accepted papers can present their work at the conference in a 20 minute talk followed by 10 minutes for questions and discussion. Accepted short papers will be published in online proceedings before the conference. After the conference, there will be an optional open call for extended papers to be published in a special issue of European Journal of Applied Linguistics (EuJAL), to appear in 2019. The submissions for this special issue in EuJAL will be subject to a review procedure organized by the journal itself.

For poster presentations (reserved for early stage research) or software/corpus demonstrations, we request abstracts in English (max. 500 words, bibliographical references not included). Authors of accepted abstracts can present their poster and/or give their demonstration during the poster session, which will be opened by one-minute ‘teaser talks’. Accepted abstracts will be printed in the book of abstracts.

More information:



- Reinhild Vandekerckhove (University of Antwerp, Belgium)
- Darja Fišer (UL, Slovenia and Jožef Stefan Institute)


- Michael Beißwenger (UDE, Germany)
- Ciara R. Wigham (LRL, France)


- Steven Coats (University of Oulu, Finland)
- Daria Dayter (University of Basel, Switzerland)
- Orphée De Clercq (Ghent University, Belgium)
- Tomaž Erjavec (Jožef Stefan Institute, Slovenia)
- Aivars Glaznieks (EuRac Research, Italy)
- Axel Herold (Berlin-Brandenburgische Akademie der Wissenschaften, Germany)
- Veronique Hoste (Ghent University, Belgium)
- Gilles Jacobs (Ghent University, Belgium)
- Mike Kestemont (University of Antwerp, Belgium)
- Florian Kunneman (Radboud University Nijmegen, The Netherlands)
- Els Lefever (Ghent University, Belgium)
- Julien Longhi (Université de Cergy-Pontoise, France)
- Harald Lüngen (Institut für Deutsche Sprache, Germany)
- Lieve Macken (Ghent University, Belgium)
- Maja Miličević (University of Belgrade, Serbia)
- Nelleke Oostdijk (Radboud University Nijmegen, The Netherlands)
- Muge Satar (Newcastle University, United Kingdom)
- Stefania Spina (University for Foreigners, Italy)
- Egon W. Stemle (EuRac Research, Italy)
- Angelika Storrer (Universitaet Mannheim, Germany)
- Hans van Halteren (Radboud University Nijmegen, The Netherlands)
- Cynthia Van Hee (Ghent University, Belgium)

Call for papers: cmccorpora2018 @ University of Antwerp

posted Feb 12, 2018, 11:08 AM by Michael Beißwenger   [ updated Feb 12, 2018, 11:13 AM ]

The 6th conference on CMC and Social Media Corpora will be held at University Antwerp, Belgium, on 17-18 September 2018. Please check the call for papers:

Recent publications

posted Oct 3, 2017, 2:10 PM by Ciara Wigham   [ updated Oct 3, 2017, 2:11 PM ]

Two new publications linked to the cmcconference series have recently been published:
Fišer, D. & Beißwenger, M.(Eds., 2017). Investigating Computer-Mediated Communication: Corpus-Based Approaches to Language in the Digital World. Ljubljana: Scientific Publishing House of the Faculty of Arts, University of Ljubljana.
Open access edition:

Wigham, C.R. & Ledegen, G. (Eds., 2017). Corpus de communication médiée par les réseaux : construction, structuration, analyse. Collection Humanités Numériques. Paris : L’Harmattan. Overview of book.


posted Oct 3, 2017, 1:59 PM by Ciara Wigham

Save the date!
The 6th CMC and Social Media Corpora for the Humanities conference (CMCCorpora18) will be held on 17-18 September 2018 at Universiteit Antwerpen in Antwerp, Belgium.

CfP: cmccorpora2017 @ Eurac, Bolzano

posted Jan 25, 2017, 8:26 AM by Ciara Wigham   [ updated Apr 9, 2017, 2:34 AM by Michael Beißwenger ]

The 5th conference CMC and Social Media Corpora for the Humanities will be held in Bolzano/Bozen, Italy on 3-4 October 2017. Please check the call for papers:

Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities

posted Sep 28, 2016, 12:31 AM by Michael Beißwenger

We proudly present the Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities which has been held on September 27-28, 2016 at the University of Ljubljana. There conference featured papers and presentations from 40 authors and co-authors from 24 research institutions in 11 countries. The complete proceedings are available open access:

Fišer, Darja; Beißwenger, Michael (eds., 2016.): 
Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities (cmc-corpora2016). University of Ljubljana.

Proceedings of the EmpiriST shared task on automatic processing of German CMC and web corpora

posted Sep 28, 2016, 12:20 AM by Michael Beißwenger

The results of the EmpiriST 2015 shared task on tokenization and part-of-speech tagging of German CMC and web corpora have been presented as part of the 10th web as corpus workshop at ACL 2016 (WaC-X). The concept and results of the task as well as the participating systems are described in the following volume:

Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task. Stroudsburg: Association for Computational Linguistics, 2016 (ACL Anthology W16-26).

CfP: NLP4CMC2016: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication / Social Media

posted Apr 27, 2016, 12:03 AM by Michael Beißwenger

NLP4CMC 2016: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication / Social Media

Workshop at KONVENS 2015, Bochum/Germany September 22, 2016


Over the past decade, there has been a growing interest in collecting, processing and analyzing data from genres of social media and computer-mediated communication (CMC): As part of large corpora which have been automatically crawled from the web, CMC data are often regarded as an unloved “bycatch” which is difficult to handle with NLP tools that have been optimized for processing edited text; on the other hand, the existence of CMC data in web corpora is relevant for all research and application contexts which require data sets that represent the full diversity of genres and linguistic variation on the web. For corpus-based variational linguistics, CMC corpora are an important resource for closing the "CMC gap" both in corpora of contemporary written language and in corpora of spoken language: Since CMC and social media make up an important part of contemporary everyday communication, investigations into language change and linguistic variation need to be able to include CMC and social media data into their empirical analyses. Nevertheless, the development of approaches and tools for processing the linguistic and structural peculiarities of CMC genres and for building CMC corpora is lacking behind the interest of dealing with these types of data in the field of language technology, corpus-based linguistics and web mining.

The goal of the NLP4CMC workshops which are organized by the GSCL special interest group "Social Media / Computer-Mediated Communication" is to provide a platform for the presentation of results and the discussion of ongoing work in adapting NLP tools for processing CMC data and in using NLP solutions for building and annotating social media corpora. The main focus of the workshops is on German data, but submissions on NLP approaches, annotation experiments and CMC corpus projects for data of other European languages are also welcome. The 1st NLP4CMC workshop was held in September 2014 at KONVENS at the University of Hildesheim. The 2nd NLP4CMC workshop was held in September 2015 at the international conference of the German Society forLanguage Technology and Computational Linguistics (GSCL) at the University of Duisburg-Essen. The papers from both workshops have been published online.


We encourage the submission of research papers on best practices in building, annotating and processing corpora and lexical semantic resources for the analysis of social media / computer-mediated communication (CMC) - including, but not restricted to the following topics:

  • Collection, representation, maintenance and computer-assisted/automatic analysis of CMC and social media resources
  • Normalization (spelling correction, ...)
  • Automatic preprocessing (tokenization, POS tagging, lemmatization, parsing, word sense disambiguation)
  • Annotation of linguistic and structural features in social media / CMC data (annotation schemas, annotation experiments, metadata ...)
  • Domain adaptation
  • Automatic methods in corpus-based CMC / social media analysis (sentiment analysis, summarization, topic detection, trend detection, ...)
  • Big-data social media analysis

Besides individual papers the workshop program will include a round-table discussion with participants from the GSCL Shared Task on Automatic Linguistic Annotation of CMC / Social Media Corpora (EmpiriST2015) which will present and discuss results from the project and future perspectives for adapting NLP systems to CMC and social media data.


  • Submissions due: 30 June 2016
  • Notification (reviews due): 31 July 2016
  • Camera-ready papers (revised versions) due: 22 August 2016
  • Workshop: 22 September 2016


Submissions should include the names and addresses of all authors and meet the following requirements:


  • Sabine Bartsch, TU Darmstadt
  • Stefanie Dipper, Ruhr University Bochum
  • Stefan Evert, University of Erlangen-Nürnberg
  • Iris Hendrickx, Radboud University Nijmegen
  • Verena Henrich, University of Tübingen
  • Axel Herold, Berlin-Brandenburg Academy of Sciences (BBAW), Berlin
  • Andrea Horbach, University of Saarbrücken
  • Tobias Horsmann, University of Duisburg-Essen
  • Anke Lüdeling, Humboldt University Berlin
  • Harald Lüngen, Institute for the German Language (IDS), Mannheim
  • Preslav Nakov, Qatar QCRI
  • Ines Rehbein, University of Potsdam
  • Roman Schneider, Institute for the German Language (IDS), Mannheim
  • Egon W. Stemle, EURAC, Bozen ?
  • Angelika Storrer, University of Mannheim
  • Simone Ueberwasser, University of Zürich
  • Kay-Michael Würzner, Berlin-Brandenburg Academy of Sciences (BBAW), Berlin

(more to be announced)


  • Michael Beißwenger (University of Duisburg-Essen, German Linguistics)
  • Michael Wojatzki (University of Duisburg-Essen, Language Technology Lab)
  • Torsten Zesch (University of Duisburg-Essen, Language Technology Lab)

The workshop is organized by the special interest group "Social Media /
Computer-Mediated Communication" of the German Society for Computational
Linguistics & Language Technology (GSCL) (

1-10 of 17