Conversational Speech Translation

Task Description

The Conversational Speech Translation Task addresses translation of disfluent, conversational Spanish speech into fluent English text using the Fisher Spanish speech dataset and recently collected fluent translations.

  • End-to-End Evaluation: For this task, we will not evaluate ASR or MT performance of intermediate models, but only the performance of the final English translations against fluent references. Every participant must generate fluent English from disfluent Spanish audio.
  • End-to-End Models: We encourage the use of end-to-end models for this task. We will have both constrained and unconstrained tracks to allow additional data (below) to encourage creative ways to remove disfluencies and produce fluent output, but will note these submissions separately.
  • Flexible evaluation schedule: The test data will be available for two months, enabling a flexible evaluation schedule for each participant.

Evaluation Conditions

  • Please indicate in your submission email whether your submission used:
    • Primary data only (constrained), or any additional data beyond Fisher Spanish speech and the provided clean English translations (unconstrained)

Allowed Training Data

  • Primary data (constrained track):
    • This task uses fluent English references which have been collected for the LDC Fisher Spanish speech dataset.
    • We provide the target data through the link above, and ask that you procure the source speech from the LDC directly. Please let us know if you have any issues.
    • The fluent translations correspond to the original translation ids for this dataset. In the original translations for this dataset, collected by JHU, certain utterances were concatenated to form more meaningful segments for translation. We provide scripts to concatenate extracted source speech feature vectors to match target translations ids here, which is contained as a submodule in the target data repository.
    • We ask that the intermediate Spanish transcripts and original translations not be used on this track, to encourage participation with end-to-end models.
  • Additional data (unconstrained track):
  • NOTE: The Spanish CALLHOME data and additional Fisher development set (dev2) are off-limits for training.

Development and Evaluation Data

Submission Guidelines

  • Multiple run and track submissions are allowed, but participants must explicitly indicate one PRIMARY run for each track. All other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.
  • Submissions have to be submitted as a gzipped TAR archive (see format below) and sent as an email attachment to elizabeth.salesky@gmail.com and sebastian.stueker@kit.edu.
  • Please note in your email whether you are participating in the constrained or unconstrained tracks or both, and whether you use end-to-end or pipeline models.
  • Each run must be a plain text file with one sentence per line.
  • Scoring will be case-insensitive without punctuation to match previous work. Submissions must be in UTF-8.

TAR archive file structure:

< UserID >/< TestSet >.<Track>.< Lang >.< UserID >.primary.txt

/< TestSet >.< Lang >.<Track>.< UserID >.contrastive1.txt

/< TestSet >.< Lang >.<Track>.< UserID >.contrastive2.txt

/...

where:

< UserID > = user ID of participant

<Track> = track (constrained or unconstrained)

< TestSet > = test set, e.g. IWSLT2019-eval1

<Lang> = es-en