Integrating a new type of language resource into the Digital Humanities landscape: French-German colloquium on standards for corpora of computer-mediated communication

University of Duisburg-Essen, June 19-20, 2017

Program

Monday, June 19

10:00

Opening

Michael Beißwenger & Ciara Wigham

10:15–12:15

Section I: Overview of corpus projects and issues to be solved

I.1. Overview of previous work on creating standards for CMC corpora (Michael Beißwenger/ Ciara Wigham)

I.2 Statement by Erhard Hinrichs

I.3 Statement by Christophe Parisse

I.4 (Teams of) participants present the corpus resources and projects in which they are or have been involved.

I.4.1 Julien Longhi/Ciara Wigham: The CoMeRe corpora

I.4.2 Lydia-Mai Ho-Dac/Céline Poudat: French Wikipedia corpus

I.4.3 Carole Etienne/Christophe Parisse: French spoken language corpora

I.4.4 Thomas Schmidt: German spoken language corpora at IDS Mannheim

I.4.5 Lothar Lemnitzer: DWDS text corpora and blog corpus at BBAW Berlin

I.4.6 Harald Lüngen: DeReKo text corpora and CMC corpora at IDS Mannheim

I.4.7 Michael Beißwenger/Laura Herzberg/Harald Lüngen/Lothar Lemnitzer/Angelika Storrer: The CLARIN-D chat corpus

I.4.8 Michael Beißwenger: Corpus project MoCoDa 2.0 (Mobile Communication Database)

I.4.9 Holger Grumt Suárez/Natali Karlova-Bourbonus: German scienceblog corpus

I.4.10 Darja Fišer: Slovene CMC corpora and corpus projects

I.4.11 Egon Stemle: The South-Tyrolean DiDi CMC corpus

12:15–13:30

Lunch

13:30-13:40

Claire Speiser (French Embassy in Germany, Service pour la Science et la Technologie)

13:30–15:00

Section II: Basic representation format (structural representation)

II.1.1 Michael Beißwenger/Julien Longhi/Harald Lüngen/Ciara Wigham: Basic representation schema(s) from the TEI special interest group “computer-mediated communication”

II.1.2 Holger Grumt Suárez/Natali Karlova-Bourbonus: Experiences with adopting the TEI-CMC schema for the Gießen scienceblog corpus

II.1.3 Darja Fišer: (Current/planned) representation of Slovene CMC corpora

II.1.4 Egon Stemle: (Current/planned) representation of the DiDi corpus

15:00–15:15

Documentation of results Section II

15:15–15:45

Coffee break

15:45–17:15

Section III: Language technology / Linguistic annotations

III.1 Torsten Zesch/Tobias Horsmann: Practices for mapping token, PoS and other type of annotations onto each other

III.2 Group discussion

17:15–17:30

Documentation of results Section III

18:30

Dinner

Tuesday, June 20

9:00–10:30

Section IV: Anonymization

IV.1.1 Thomas Schmidt: Anonymization in the spoken language corpora at IDS Mannheim

IV.1.2 Carole Etienne/Christophe Parisse: Anonymization in French spoken language corpora

IV.1.3 Harald Lüngen/Michael Beißwenger/Laura Herzberg/Cathrin Pichler: Anonymization in the CLARIN-D chat corpus

IV.1.4 Julien Longhi/Céline Poudat/Ciara Wigham: Anonymization in the French CoMeRe corpora

IV.1.5 Egon Stemle: Anonymization in the South Tyrolean DiDi CMC corpus

IV.1.6 Darja Fišer: Anonymization in the Slovene CMC corpora

IV.2 Discussion

10:30–10:45

Documentation of results Section IV

10:45–11:15

Coffee break

11:15–12:45

Section V: Metadata

V.1 Group discussion

12:45–13:00

Documentation of results Section V

13:00–14:15

Lunch

14:15–15:45

Section VI: Practical issues in compiling a demo corpus with samples from different existing corpora (CMC, text, spoken language)

VI.1 Group brainstorming

15:45–16:15

Documentation of results Section VI

16:15–16:45

Coffee break

16:45–17:30

Concluding round-table: Next steps? Documentation and dissemination of the results of the colloquium?