ASR AM Training Data (English, German): sign this agreement, scan it and send to gretter@fbk.eu.
Parallel:
parallel corpora from the Wikipedia for en-{cs,de,fr,vi} pairs, provided by Krzsyztof Wołk of Polish-Japanese Academy of Information Technology
Monolingual:
TED
any subtitles of a TED or TEDx talk that is not listed as non-permissible.
LDC
LDC2011T11 Arabic Gigaword Fifth Edition
Miscellaneous:
Cantab Research baseline LM and text corpus kindly provided by Tony Robinson of Cantab Research (now considered compatible with IWSLT 2015)