The IWSLT Evaluation will focus this year on the translation of TED and TEDx talks. As in the previous years, the evaluation offers specific tracks for all the core technologies involved in spoken language translation, namely:

  • Automatic speech recognition (ASR), i.e. the conversion of a speech signal into a transcript,

  • Machine translation (MT), i.e. the translation of a polished transcript into another language,

  • Spoken language translation (SLT), i.e. the conversion and translation of a speech signal into a transcript in another language. 
Tracks are further organised into several specific tasks, covering specific languages, and training, development, and testing data for each track are freely available to the participants. Finally, given the large number of offered language combinations only a limited number of them will be evaluated officially. Nevertheless, participants are very welcome to present results at the workshop also on the unofficial tasks. 

Permissible Training Data

MT Systems and Language Models for ASR:

Training of MT systems and language models for ASR is constrained to data supplied by the organizers or listed as permissible.

Participants can use any other linguistic resource provided that it does not include or exploits these TED talks; any use of additional data with respect to that explicitly listed by the organizers has to be clearly stated in the system paper and communicated at the submission: such runs will be marked as "Unconstrained Training".

ASR Acoustic Modeling

As for ASR acoustic modeling no training data are distributed. Participants are allowed to use any publicly available data except for these TED talks. In addition to that, participants can obtain access to Euronews data (100 hours of speech for each language) provided by the organizers. To access this data, only for research purposes, you have to sign this agreement, scan it and send to