Multilingual Task

The Multilingual Task addresses text translation, including zero-shot translation, with a single MT system across all directions including English, German, Dutch, Italian and Romanian. As unofficial task, conventional bilingual text translation is offered between English and Arabic, French, Japanese, Chinese, German and Korean.

Language directions:
- Multilingual Task (official): {English, Dutch, German, Italian, Romanian} (any pair)
- Bilingual Task (unofficial): English <-> {Arabic, French, Japanese, Chinese, German, Korean}
- Notes on processing of Arabic, Japanese, Korean and Chinese for evaluation:
  - Arabic: the Arabic text needs to be normalised before scoring. We will process the references and submitted runs by means of the QCRI Arabic Normalizer 3.0 that can be downloaded from

http://alt.qcri.org/tools/arabic-normalizer/

Our installation is equipped with the following required and publicly available support tools:

- - MADA: Morphological Analysis and Disambiguation for Arabic, version 3.2
  - Aramorph, version 1.2.1: morphological analyzer for Arabic
  - SRILM, version 1.5.10 (disambig)
  - SVMTool, version 1.3.1 (SVMTagger)

- Japanese: following the procedure suggested here, Japanese text is processed by means of
  - Mecab version 0.996
  - Dictionary for Mecab: mecab-ipadic-2.7.0-20070801.tar.gz (installed with --with-charset=utf-8 option)
- Korean: Korean text is processed via Mecab as well:
  - Mecab for korean, version 0.996
  - Dictionary for Mecab: mecab-ko-dic-2.0.1-20150920.tar.gz (installed with --with-charset=utf-8 option)
- Chinese: Chinese texts are evaluated at character level; before evaluation, texts are processed by means of this script, which splits sequences of Chinese characters, but keeps sequence of non-Chinese characters as they are

Evaluation conditions for the Multilingual Task:
- Small data condition: permissible data are only the in-domain training and development data (see below)
- Zero-shot condition: same as the small data condition, with the following additional restrictions:
  - the nl-de, de-nl, it-ro, and ro-it directions must be excluded from the training and development sets
  - training data synthesis from other pairs and pivoting are allowed as contrastive conditions
- Large data condition: in addition to the in-domain data, any linguistic resource listed as permissible here can be used to build the MT systems
In-Domain training and development data:
- Multilingual Task: an archive with training/dev sets for any language pair can be downloaded from the WIT3 website
- Bilingual Task: archives for each language pair can be downloaded from the WIT3 website
Evaluation data:
- Input format: NIST XML format, case sensitive source text with punctuation
- Output format: NIST XML format, detokenized case sensitive translations with punctuation. NIST XML format is described in this paper (Section 8 and Appendix A); XML templates will be made available; meanwhile you can refer to the XML templates of the 2016 edition
- Submission: please, refer to the submission guidelines provided below
- text encoding: UTF8
- Multilingual Task: tst2017 will include TED talks in English, German, Dutch, Italian and Romanian that have to be translated into any different language from the same set of languages (20 different pairs in total); it can be downloaded from the WIT3 website
- Bilingual Task: tst2017 + tst2016 (progressive) for any English <-> {Arabic, French, Japanese, Chinese, German, Korean} pair; they can be downloaded from the WIT3 website
Evaluation process:
- Case sensitive BLEU and NIST scores are computed with the NIST script mteval-v13a.pl, while case sensitive TER score with tercom.7.25.jar. The respective invocations are:
  - mteval-v13a.pl -c
    - java -Dfile.encoding=UTF8 -jar tercom.7.25.jar -N -s
- The internal tokenization of the two scorers is exploited
- For Arabic, Japanese, Korean and Chinese see notes above
Evaluation Server:

An online Evaluation Server is available to score systems on development sets. After the evaluation period, the server will score evaluation sets as well. Participants interested in using the server are kindly asked to contact cettoloATfbkDOTeu

Submission Guidelines:

Each participant has to submit at least one run for each translation task s/he registered for.

Detokenized case sensitive automatic translations with punctuation have to be wrapped in NIST XML formatted files. NIST XML format is described in this paper (Section 8 and Appendix A); XML templates will be made available; meanwhile you can refer to the XML templates of the 2016 edition

XML files with runs have to be submitted as a gzipped TAR archive (in the format specified below) and e-mailed to cettoloATfbkDOTeu

TAR archive file structure:

<UserID>/<Set>.<Task>.<UserID>.primary.xml

/<Set>.<Task>.<UserID>.contrastive1.xml

/<Set>.<Task>.<UserID>.contrastive2.xml

/...

where:

<UserID> = user ID (short name) of participant provided in the Registration Form

<Set> = IWSLT17.tst2017 | IWSLT17.tst2016

<Task> = multilingual_small | multilingual_zero_shot |multilingual_large | bilingual_<fromLID>-<toLID>

<fromLID>, <toLID> = Language identifiers (LIDs) as given by ISO 639-1 codes; see here examples of language codes.

The PRIMARY run for each Multilingual Task will be used for the official scoring; nevertheless, CONTRASTIVE runs and runs of the unofficial task (Bilingual Task) will be evaluated as well. In the same archive, runs for different tasks can be included.

Example:

fbk/IWSLT17.tst2017.multilingual_small.fbk.primary.xml

/IWSLT17.tst2017.multilingual_zero_shot.fbk.primary.xml

/IWSLT17.tst2017.multilingual_large.fbk.primary.xml

/IWSLT17.tst2017.multilingual_zero_shot.fbk.contrastive1.xml

/IWSLT17.tst2017.bilingual_fr-en.fbk.primary.xml

/IWSLT17.tst2017.bilingual_fr-en.fbk.contrastive1.xml

/IWSLT17.tst2017.bilingual_fr-en.fbk.contrastive2.xml

/IWSLT17.tst2016.bilingual_fr-en.fbk.primary.xml

Re-submitting runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2017 evaluation and previous mails will be ignored.