MT Track

As in previous editions, the MT exercise for this year will exploit TED and TEDX talks, a collection of public speeches on a variety of topics for which video, transcripts and translations are available on the Web.

The goal of this track is to translate TED and TEDX subtitles, provided as segmented text. Training data is limited to a supplied collection of freely available parallel texts (see the information at the homepage), including a parallel corpus of TED Talks (see below the "In-Domain training and development data" entry).

  • Language directions: English <-> {French, German, Chinese, Czech, Thai, Vietnamese}

  • In-Domain training and development data:

  • Evaluation data:

    • Input format: NIST XML format, case sensitive source text with punctuation

    • Output format: NIST XML format, detokenized case sensitive translations with punctuation. NIST XML format is described in this paper (Section 8 and Appendix A); you can refer to these XML templates

    • Submission: please, refer to the submission guidelines provided below

    • text encoding: UTF8; the simplified set of characters is employed for Chinese text

    • tst2015 + tst2014 (progressive)

    • Note on German text of the TEDX evaluation sets: The talks are transcribed as they are spoken. Therefore, in some cases the ending 'e' of German words is not written (e.g. glaub' instead of glaube). The same happens for the beginning of the German indefinite article. If you do not address this in your translation system, you can use the following simple replacement:

        • sed -e "s/' /e /g" -e "s/'n /ein /g" -e "s/'ne /eine /g" -e "s/'nem /einem /g" -e "s/'nen /einen /g" -e "s/'ner /einer /g"

  • Evaluation:

    • case sensitive BLEU and NIST scores are computed with the NIST script, while case sensitive TER score with tercom.7.25.jar. The respective invocations are:

      • -c

        • java -Dfile.encoding=UTF8 -jar tercom.7.25.jar -N -s

    • the internal tokenization of the two scorers is exploited

    • note on Thai evaluation: Thai references are segmented according to the guideline defined at the InterBEST 2009

    • note on Chinese evaluation: Chinese texts are evaluated at character level; before evaluation, Chinese texts are processed by means of this script, which splits sequences of Chinese characters, but keeps sequence of non-Chinese characters as they are

  • Evaluation Server:

An online Evaluation Server is available for MT tasks. Currently, it is enabled to score development sets; after Oct 5 (the end of the evaluation period of the MT track), it will score evaluation sets as well. There are two ways to use it:

    • via Web page

    • via REST Web service

Its use is restricted to the Participants. If you are interested in using it, please send an e-mail to cettolo AT fbk DOT eu

Submission Guidelines:

Each participant has to submit at least one run for each translation task s/he registered for.

Runs have to be wrapped in NIST XML formatted files, detokenized case sensitive translations with punctuation. NIST XML format is described in this paper (Section 8 and Appendix A); you can refer to these XML templates

XML files with runs have to be submitted as a gzipped TAR archive (in the format specified below) and e-mailed to cettolo AT fbk DOT eu and sebastian DOT stueker AT kit DOT edu

TAR archive file structure:






<UserID> = user ID of participant used to download data files

<Set> = IWSLT15.TED.tst2015 | IWSLT15.TED.tst2014 | IWSLT15.TEDX.tst2015 | IWSLT15.TEDX.tst2014

<Task> = MT_<fromLID>-<toLID>

<fromLID>, <toLID> = Language identifiers (LIDs) as given by ISO 639-1 codes; see here examples of language codes.

The PRIMARY run for each task will be used for the official scoring; nevertheless, CONTRASTIVE runs will be evaluated as well. In the same archive, runs for different tasks (i.e. language pairs) can be included.








Re-submitting runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2015 evaluation and previous mails will be ignored.