Update! (4/14): Evaluation metrics of baseline systems are added.
Update! (4/10): Amazon has graciously offered free computation time to sponsor the evaluation, and also prepared a system/compute for interested participants.
Update! (3/27): We have changed the AWS instance for CPUs from T2 to M5.
Update! (3/21): We have changed the GPU type from K80 to V100.
Update! (5/19): Official evaluation results are published!
Basic Idea
The basic idea of this task (inspired by the small NMT task at the workshop on Asian Translation) is that for NMT, not only accuracy, but also test-time efficiency is important.
Efficiency can include a number of concepts:
Tracks
The goal of the task will be to find systems that are both accurate and efficient. In other words, we want to find systems on the Pareto Frontier of efficiency in accuracy. Participants can submit any system that they like, and any system on this Pareto frontier will be considered advantageous. However, we will particularly highlight systems that satisfy one of the two categories:
Corpus
Procedure
Providing the Docker image of the translator system
Competitors should submit a Docker image with all of the software and model files necessary to perform translation.
Executing provided system on the Amazon EC2, and gathers runtime metrics
Competitors can assume that the system is launched by following commands:
docker load -i <image-file>
docker run [restriction-options (see below)] [--runtime=nvidia (if using GPUs)] --name=translator -itd wnmt2018_<team-name>_<system-name>
docker cp <src> translator:<in-file>
docker exec translator sh run.sh <in-file> <out-file>
docker cp translator:<out-file> <hyp>
docker rm -f translator
There are some resource constraints on the machines that we will use to run the evaluating process:
Metrics
Our evaluator will gather following metrics of the competitor’s systems:
1. to 3. are measured using at least two files:
If the process (each trial of run.sh) did not finish within the specified time limit, the server will kill all running processes, and will not record any results about the submitted system.
Baseline Systems
We provide 3 baseline images:
Each docker images can be downloaded from the Google Drive.
Following table shows the metrics of baseline systems on the evaluation server (expected; please grab results for official metrics):
(BLEU are calculated without any postprocessings)
In addition, Amazon has released a Sockeye system that competitors can use for the WNMT18 shared task with step-by-step directions and a link for downloading a pre-trained model, which can be found here.
Computation Credits
Amazon has kindly donated computation credits for teams that wish to test their models on AWS. Any teams that plan on participating in the shared task, please contact the shared task organizers and we will be happy to give you credits.
Also, if you are interested in training systems using Sockeye but don't have resources to do so on your own, there are larger portions of free credits available for a limited number of teams. Similarly, get in contact with the task organizers and we will help you out.
Submission Information
Results
We finally got 13 systems (6 for CPU and 7 for GPU) from 4 teams. All metrics we retrieved are listed in Google Spreadsheet.
Contact Information
If you have any questions, please contact wnmt2018-shared-task [at] googlegroups.com
Important Dates