Integrating a new type of language resource into the Digital Humanities landscape: French-German colloquium on standards for corpora of computer-mediated communication
University of Duisburg-Essen, June 19-20, 2017
The objective of the colloquium is a survey of the state-of-the-art for representing and annotating corpora of computer-mediated communication (CMC corpora) in the Humanities and of key issues that have to be solved as a prerequisite for combining, connecting and merging CMC corpora for different languages and genres amongst each other and with corpora of other types (text corpora, spoken language corpora).
The result of the colloquium will serve as a specification for further work on DH standards on fundamental aspects of corpus creation (retrieval and representation of metadata, structural representation of CMC genres, linguistic annotation, provision as part of language resource infrastructures). To reach its objective, the colloquium brings together not only creators of CMC corpora and scholars interested in corpus-based CMC research, but also representatives of language resource infrastructure projects (CLARIN-D and ORTOLANG) as well as of large, existing collections of text and speech corpora (IDS Mannheim, Berlin-Brandenburg Academy of Sciences, CLAPI-ICAR Laboratory, Corpus de la parole - Ministère de la culture et de la communication). For the specification of requirements for the linguistic processing of CMC data, the colloquium includes researchers from the field of natural language processing (NLP). The expertise and resources of the participants form a pool of best practices upon which to build.
The interdisciplinary composition of the participants will guarantee that the results of the colloquium have an impact in different disciplines. Results of the colloquium will be made available on the web and shall be presented at the 5th Conference on CMC and social Media Corpora in the Humanities in Bolzano/Italy, October 2017. They will serve as an input for ongoing and future CMC corpus projects in different languages.