Home
Introduction
The IWSLT Evaluation will focus this year on the translation of TED and TEDx talks. As in the previous years, the evaluation offers specific tracks for all the core technologies involved in spoken language translation, namely:
Automatic speech recognition (ASR), i.e. the conversion of a speech signal into a transcript,
Machine translation (MT), i.e. the translation of a polished transcript into another language,
Spoken language translation (SLT), i.e. the conversion and translation of a speech signal into a transcript in another language.
Tracks are further organised into several specific tasks, covering specific languages, and training, development, and testing data for each track are freely available to the participants. Finally, given the large number of offered language combinations only a limited number of them will be evaluated officially. Nevertheless, participants are very welcome to present results at the workshop also on the unofficial tasks.
Potential participants in the Evaluation are invited to check out our Call for Participation, fill in the Registration form, and join our e-mail list.
Permissible Training Data
MT Systems and Language Models for ASR:
Training of MT systems and language models for ASR is constrained to data supplied by the organizers.
ASR Acoustic Modeling
As for ASR acoustic modeling no training data are distributed. For German, participants are allowed to use any publicly available data recorded before July 17th 2012. For English, the data has to be recorded before December 31st 2010. For Italian, the data has to be recorded before June 30th 2011.(*) In addition to that, participants are allowed to use the Euronews data (100 hours of speech for each language) provided by the organizers, regardless of any cut-off date. To access this data, only for research purposes, you have to sign this agreement, scan it and send to gretter@fbk.eu.
(*) The development set for Italian (IWSLT14.SLT.dev2014.it-en.it) actually includes some talks recorded before the cut-off date set for the evaluation; here their talkid, the event where they were issued and the recording date:
_T1JRV5YPDA TEDXLakeComo 4 Nov 2009
AZgAevU6AtM TEDXLakeComo 6 Nov 2010
ALyeJY_ZQS0 TEDXLakeComo 6 Nov 2010
2wIhH2Rn4pM TEDXLakeComo 6 Nov 2010
Therefore, take them off if you are considering to use additional training resources.