Text Translation Task

Task Description

The Text Translation Task this year addresses a new translation direction: from English to Czech. We would like participants to investigate the translation into a moderate morphologically rich language and to overcome the difficulty of having less resources. Furthermore, the participants need to consider applying domain and genre adaptation methods, as we would test the translation system on spoken style TED talks.

Important Dates

Release of train and dev data: June 1st
Evaluation period: July 1st to Sep. 8th
System description: Sep. 22th
Review feedback : Oct. 7th
Camera ready: Oct. 13th
Conference: Nov 2nd and 3nd

Translation Direction

English to Czech

Allowed Training Data

Provided data
- TED corpus from MUST-C special release for IWSLT'19 (you would need to fill up a simple form in order to access the data)
Additional data
- Data provided by WMT 2019

Other allowed data will be updated here and informed to the participants via the evaluation mailing list.

Development and Evaluation Data

Development data: the dev set from MUST-C you downloaded above.
Evaluation data: tst2019

Submission Guidelines

Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run for each track. All other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.
Submissions have to be submitted as a gzipped TAR archive (format see below) and sent as an email attachment to jan.niehues@kit.edu and sebastian.stueker@kit.edu.
Each run has to be stored in SGML format or plain text file with one sentence per line
Scoring will be case-sensitive and including the punctuation. Submissions have to be in UTF-8.

TAR archive file structure:

< UserID >/< Set >.< Task >.< UserID >.primary.xml

/< Set >.< Task >.< UserID >.contrastive1.xml

/< Set >.< Task >.< UserID >.contrastive2.xml

/...

where:

< UserID > = user ID of participant used to download data files

< Set > = IWSLT19.SLT.tst2019

<fromLID>, <toLID> = Language identifiers (LIDs) as given by ISO 639-1 codes; see for example the WIT3 webpage.

The PRIMARY run for each Multilingual Task will be used for the official scoring; nevertheless, CONTRASTIVE runs will be evaluated as well. In the same archive, runs for different tasks can be included. Re-submitting runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2019 evaluation and previous mails will be ignored.

Google Sites

Report abuse