While the original shared task has finished, we welcome new submissions to make this a rolling evaluation. Overview paper Results Spreadsheet
The efficiency task measures machine translation inference performance on CPUs and GPUs with standardized training data and hardware. Participants provide their own code and models. This is the third time we run the efficiency shared task, building on the 2019 and 2018 tasks that in turn were based on the WAT 2017 Small-NMT Task.
We follow the constrained condition of the WMT 2019 English-German news translation task. We recommend using the WMT 2018 test set for development. Additional data from WMT 2020 is not allowed.
The input and output format are raw text files with one sentence per line. Lines are separated in UNIX style. Participants are responsible for their own tokenization and detokenization, which will be included in run time. Unlike previous years, we do not mandate a particular tokenization.
There are two types of hardware: CPU and GPU.
For CPUs, performance will be measured on an AWS c5.metal instance. This machine has 96 hyperthreads on a Cascade Lake architecture and 192 GB RAM. This year, there are two CPU tracks: use the entire machine or use only one core.
For GPUs, performance will be measured on an AWS g4dn.xlarge instance. This machine has one NVidia T4 GPU while the host has 4 vCPUs and 16 GB RAM. The NVidia driver will be version 440.44.
Amazon has donated some AWS credits so participants can test on the exact hardware. These are available on a first-come first-served basis and there is some lead time required to get GPUs approved in your account. Contact wngt-info@googlegroups.com.
Each system will be run with 1 million lines of raw English input, where each line has at most 100 space-separated words (though your tokenizer will probably break that into more). We will measure the following:
Results will be reported in a table showing all metrics. The presentation will include a series of Pareto frontiers comparing quality with each of the efficiency metrics. We welcome participants optimizing any of the metrics.
Unlike past years, we will not be subtracting loading time from run times. The large input is intended to amortize loading time. Based on last year's results, we anticipate that all submissions will complete in under 2 hours.
We will report model size on disk, which means we need to define a model as distinct from code. The model includes everything derived from data: all model parameters, vocabulary files, BPE configuration if applicable, quantization parameters or lookup tables where applicable, and hyperparameters like embedding sizes. You may compress your model using standard tools (gzip, bz2, xzip, etc.) and the compressed size will be reported.
Code can include simple rule-based tokenizer scripts and hard-coded model structure that could plausibly be used for another language pair. If we suspect that your model is hidden in code, we may ask you to provide another model of comparable size for a surprise language pair with reasonable quality.
This is an open task and participants can use whatever software and models they want subject to the hardware and data constraints.
Using existing constrained WMT 2019 systems, such as Facebook's, is permissible. Microsoft is working to provide the sentence-level components of their English-German submission to WMT 2019 as well. These are not optimized for speed, but might be useful as teachers for a teacher-student setting.
Competitors should submit a Docker image with all of the software and model files necessary to perform translation.
Competitors can assume that the system is launched by following commands:
docker load -i <image-file>
docker run [restriction-options (see below)] [--runtime=nvidia (if using GPUs)] --name=translator -itd wnmt2020_<team-name>_<system-name>
docker cp <src> translator:<in-file>
docker exec translator sh run.sh <in-file> <out-file>
docker cp translator:<out-file> <hyp>
docker rm -f translator
Send a submission mail to wngt-info@googlegroups.com with the following information:
Teams are encouraged to make multiple entries to cover more of the Pareto frontier or contrast methods.
All deadlines are 23:59:59 anywhere on earth (UTC-12).
* April 24 2020: shared task submissions
* May 5, 2020: system description submissions at https://www.softconf.com/acl2020/wngt/ . Despite what the form says, system descriptions do not need to be anonymous. Teams can be uniquely identified by the number of submissions they made anyway.
* May 25 2020: camera-ready system description