The goal of the SLT track is to transcribe and translate TED and TEDx lectures. The audio of the talks is unsegmented.
Participants can either use their own ASR system to generate a transcription of the test data or use a reference output.
These outputs are either the best ASR system available for a set or a combination of systems from multiple partners.
The reference output includes a baseline automatic segmentation in sentence-like units. Participants are encouraged to develop advanced methods for better sentence segmentation.
Language directions: English->{German, French, Chinese, Czech, Thai, Vietnamese}, German->English
Development Data (for reference translation see the MT track):
English->{German,French, Chinese} dev data: dev2010, dev2010.1best, tst2013.1best (English-German), (English-French), (English-Chinese)
German->English dev data: dev2012, dev2012.1best, tst2013.1best
Evaluation Data:
(Note that the provided development data uses a human sentence segmentation. This will not be provided for the official test data)
Submission Guidelines
SLT Run Submission Format:
Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run for each track. All other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.
Submissions have to be submitted as a gzipped TAR archive (format see below) and sent as an email attachment to jan dot niehues _AT_ kit.edu
Each run has to be stored in SGML format or plain text file with one sentence per line
Scoring will be case-sensitive and including the punctuation. Submissions have to be in UTF-8.
TAR archive file structure:
< UserID >/< Set >.< Task >.< UserID >.primary.xml
/< Set >.< Task >.< UserID >.contrastive1.xml
/< Set >.< Task >.< UserID >.contrastive2.xml
/...
where:
< UserID > = user ID of participant used to download data files
< Set > = IWSLT15.TED.tst2015 | IWSLT15.TEDX.tst2015
<Task> = SLT_<fromLID>-<toLID>
<fromLID>, <toLID> = Language identifiers (LIDs) as given by ISO 639-1 codes; see for example the WIT3 webpage
.
Examples:
kit/IWSLT.TED.tst2015.SLT_en-de.kit.primary.xml
/IWSLT.TEDX.tst2015.SLT_de-en.kit.primary.xml
Re-submitting your runs is allowed as long as the mails arrive BEFORE the submission deadline. If multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2015 evaluation and previous mails will be ignored.