The IWSLT Evaluation will focus this year on the translation of TED and TEDx talks. As in the previous years, the evaluation offers specific tracks for all the core technologies involved in spoken language translation, namely:
Permissible Training Data
MT Systems and Language Models for ASR:
Training of MT systems and language models for ASR is constrained to data supplied by the organizers.
ASR Acoustic Modeling
As for ASR acoustic modeling no training data are distributed. For German, participants are allowed to use any publicly available data recorded before July 17th 2012. For English, the data has to be recorded before December 31st 2010. For Italian, the data has to be recorded before June 30th 2011.(*) In addition to that, participants are allowed to use the Euronews data (100 hours of speech for each language) provided by the organizers, regardless of any cut-off date. To access this data, only for research purposes, you have to sign this agreement, scan it and send to email@example.com.
(*) The development set for Italian (IWSLT14.SLT.dev2014.it-en.it) actually includes some talks recorded before the cut-off date set for the evaluation; here their talkid, the event where they were issued and the recording date:
Therefore, take them off if you are considering to use additional training resources.