ASR Track

The ASR track consists of two tasks:

  • TED and QED talks. (Note: automatic segmentation of the data is mandatory!)

  • The transcription of Microsoft Speech Language Translation (MSLT) test data, consisting of bilingual Skype calls

For both tasks the following specifications apply:

  • Languages:

    • Talks Task: English

    • MSLT Task: English, German

  • Input format: unsegmented SPHERE, 16kHz, 16bit, PCM

  • Output format: CTM, no case, no punctuation, UTF-8. Confidence measures are optional, but appreciated

For the talk task:

  • Development Data

  • Evaluation Data

      • English tst2016 + tst2015 (progressive) will be made available in time

For the MSLT test task:

In order to receive the development and/or evaluation data for the MSLT test task, participants need to sign this data license agreement and need to sign up in the participant registration form for the respective tasks. Please return your signed copy electronically to sebastian.stueker@kit.edu or fax it to +49 721 607 721. The data will then be made available to you.

  • Development Data

    • English: will be made available under the conditions cited above

    • German: will be made available under the conditions cited above

  • Evaluation Data:

      • English: tst2016 will be made available in time

      • German: tst2016 will be made available in time

Permissible Data

See the information on the homepage and respect the disallowed TED and QED talks.

Submission Guidelines

ASR Run Submission Format:

  • Each participant has to submit at least one run for each of the tasks s/he registered for.

  • Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run for each track. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.

  • Runs have to be submitted as a gzipped TAR archive (format see below) and sent as an email attachment to cettolo@fbk.eu and jan.niehues@kit.edu and sebastian.stueker@kit.edu

  • Submissions have to be made in CTM format. See the ctm documentation in the NIST SCTK documentation for details. The confidence values are optional. The channel number has to be '1'. Scoring will be case-insensitive. Submissions have to be in UTF-8.

Output conventions

  • The text will be scored case-insensitive, but can be submitted case-sensitive

  • Numbers, dates etc. need to be transcribed in words as they are spoken, not in digits

  • Common acronyms such as NATO, EU, are written as one word, without any special markers between the letters. This applies no matter whether they are spoken as one word or spelled out as a letter sequence

  • All other letter spelling sequences are written as individual letters with space inbetween

  • Standard abbreviations, such as "etc." "Mr." are accepted as specified by the glm file in the scoring package

  • For words pronounced in their contracted form, the orthography for the contracted form may be used. These cases will be normalized by the glmfile to their canonical form.

TAR archive file structure:

< UserID >/< Set >.< Task >.< UserID >.primary.ctm

/< Set >.< Task >.< UserID >.contrastive1.ctm

/< Set >.< Task >.< UserID >.contrastive2.ctm

/...

where:

< UserID > = user ID of participant used to download data files

< Set > = tst2015|tst2013

< Task > = ASR_ENG | ASR_GER

Examples:

fbk/tst2014.ASR_ENG.fbk.primary.ctm

/tst2014.ASR_ENG.fbk.contrastive1.ctm

Re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only runs of the most recent submission mail will be used for the IWSLT 2016 evaluation and previous mails will be ignored.

    • English development data: tst2014 for TED; no dev data is provided for QED.