Restricted Translation Task will be held at Workshop on Asian Translation (WAT2023) collocated with Machine Translation Summit 2023 in September 4-8, 2023.
System submission due on June 16, 2023
ASPEC (scientific papers) [1]
English → Japanese
Japanese → English
Chinese → Japanese
Japanese → Chinese
Dev set : en / ja | For Zh-Ja: zh / ja / scores | For Ja-Zh: zh / ja / scores
Devtest set : en / ja | For Zh-Ja: zh / ja / scores | For Ja-Zh: zh / ja / scores
Test set : en / ja | For Zh-Ja: zh / ja / scores | For Ja-Zh: zh / ja / scores
Each file contains a vocabulary list with an empty line delimiter. Here is a sample vocabulary list from the English dev set (en). Note that every target vocabulary is shown in random order.
l.1 miniature integrated circuit elements
l.2 high-density information record technology
l.3 next generation semiconductors
l.4 (empty line)
We also provide two direct-assessment scores ranging [0, 100] per Zh<>Ja dictionary item, where an item with 100 indicates a translation pair that is the most highly evaluated by bilingual human annotators. We provide the Zh<>Ja dictionary lists whose average scores are >= 50.
We calculate two distinct metrics in this task.
Usual translation accuracy according to the WAT convention (including BLEU).
A consistency score: the ratio of #sentences satisfying exact match of given constraints over the whole test corpus.
For the "exact match" evaluation, we will conduct the following process:
English: simply lowercase hypotheses and constraints, then judge character level sequence matching (including whitespaces) for each constraint.
Japanese: judge character level sequence matching (including whitespaces) for each constraint without preprocessing.
For the final ranking, we also calculate the combined score of both: calculating BLEU with only the exact match sentences:
Calculates the exact match.
If the translation does not satisfy the constraint, replace the translation with an empty string (this simulates that "the process failed to respond").
Calculates BLEU with modified translations.
Note: in this scenario, the brevity score in BLEU does not figure out a usual meaning, but the n-gram scores maintain their consistency.
We also plan to run a human evaluation that appraises the top-ranked submitted systems by bilingual human annotators.
Each submission file has to be in a format that is used in the BLEU score calculation script (See sacrebleu). We also expect translation outputs to be re-cased and de-tokenized in both English and Japanese.
Please read out the WAT'23 official submission page for system submission. After submitting your systems, please fill in the form so that we can keep track of them.
System submission due on June 16, 2023
System description paper submission due on June 30, 2023
Review feedback of system description papers: July 28, 2023
Camera-ready deadline for system description papers: August 11, 2023
Workshop on Asian Translation: September 4, 2023
[1] Nakazawa et al., "ASPEC: Asian Scientific Paper Excerpt Corpus", in Proc. of LREC, 2016.
[2] Cettolo et al., "Overview of the IWSLT 2017 evaluation campaign", in Proc. of IWSLT, 2017.
[3] Sakaguchi and Durme, "Efficient Online Scalar Annotation with Bounded Support", in Proc. of ACL, 2018.
[4] Federmann. "Appraise Evaluation Framework for Machine Translation", in Proc. of COLING, 2018. (GitHub)
Akiko Eriguchi, Microsoft, USA
Yusuke Oda, Inspired Cognition and Tohoku University, Japan
For general questions, comments and etc., please email to "wat-organizer -at- googlegroups -dot- com".