Integrating a new type of language resource into the Digital Humanities landscape: French-German colloquium on standards for corpora of computer-mediated communication
University of Duisburg-Essen, June 19-20, 2017
Program
Monday, June 19
10:00
Opening
Michael Beißwenger & Ciara Wigham
10:15–12:15
Section I: Overview of corpus projects and issues to be solved
I.1. Overview of previous work on creating standards for CMC corpora (Michael Beißwenger/ Ciara Wigham)
I.2 Statement by Erhard Hinrichs
I.3 Statement by Christophe Parisse
I.4 (Teams of) participants present the corpus resources and projects in which they are or have been involved.
I.4.1 Julien Longhi/Ciara Wigham: The CoMeRe corpora
I.4.2 Lydia-Mai Ho-Dac/Céline Poudat: French Wikipedia corpus
I.4.3 Carole Etienne/Christophe Parisse: French spoken language corpora
I.4.4 Thomas Schmidt: German spoken language corpora at IDS Mannheim
I.4.5 Lothar Lemnitzer: DWDS text corpora and blog corpus at BBAW Berlin
I.4.6 Harald Lüngen: DeReKo text corpora and CMC corpora at IDS Mannheim
I.4.7 Michael Beißwenger/Laura Herzberg/Harald Lüngen/Lothar Lemnitzer/Angelika Storrer: The CLARIN-D chat corpus
I.4.8 Michael Beißwenger: Corpus project MoCoDa 2.0 (Mobile Communication Database)
I.4.9 Holger Grumt Suárez/Natali Karlova-Bourbonus: German scienceblog corpus
I.4.10 Darja Fišer: Slovene CMC corpora and corpus projects
I.4.11 Egon Stemle: The South-Tyrolean DiDi CMC corpus
12:15–13:30
Lunch
13:30-13:40
Claire Speiser (French Embassy in Germany, Service pour la Science et la Technologie)
13:30–15:00
Section II: Basic representation format (structural representation)
II.1.1 Michael Beißwenger/Julien Longhi/Harald Lüngen/Ciara Wigham: Basic representation schema(s) from the TEI special interest group “computer-mediated communication”
II.1.2 Holger Grumt Suárez/Natali Karlova-Bourbonus: Experiences with adopting the TEI-CMC schema for the Gießen scienceblog corpus
II.1.3 Darja Fišer: (Current/planned) representation of Slovene CMC corpora
II.1.4 Egon Stemle: (Current/planned) representation of the DiDi corpus
15:00–15:15
Documentation of results Section II
15:15–15:45
Coffee break
15:45–17:15
Section III: Language technology / Linguistic annotations
III.1 Torsten Zesch/Tobias Horsmann: Practices for mapping token, PoS and other type of annotations onto each other
III.2 Group discussion
17:15–17:30
Documentation of results Section III
18:30
Dinner
Tuesday, June 20
9:00–10:30
Section IV: Anonymization
IV.1.1 Thomas Schmidt: Anonymization in the spoken language corpora at IDS Mannheim
IV.1.2 Carole Etienne/Christophe Parisse: Anonymization in French spoken language corpora
IV.1.3 Harald Lüngen/Michael Beißwenger/Laura Herzberg/Cathrin Pichler: Anonymization in the CLARIN-D chat corpus
IV.1.4 Julien Longhi/Céline Poudat/Ciara Wigham: Anonymization in the French CoMeRe corpora
IV.1.5 Egon Stemle: Anonymization in the South Tyrolean DiDi CMC corpus
IV.1.6 Darja Fišer: Anonymization in the Slovene CMC corpora
IV.2 Discussion
10:30–10:45
Documentation of results Section IV
10:45–11:15
Coffee break
11:15–12:45
Section V: Metadata
V.1 Group discussion
12:45–13:00
Documentation of results Section V
13:00–14:15
Lunch
14:15–15:45
Section VI: Practical issues in compiling a demo corpus with samples from different existing corpora (CMC, text, spoken language)
VI.1 Group brainstorming
15:45–16:15
Documentation of results Section VI
16:15–16:45
Coffee break
16:45–17:30
Concluding round-table: Next steps? Documentation and dissemination of the results of the colloquium?