Shared Task

SHARED TASK


INTRODUCTION

We are excited to introduce a new shared task for this year’s CoCo4MT workshop! Our aim is to encourage and facilitate research on corpus construction for low-resource machine translation. 

Corpus creation for machine translation is typically constrained by the cost and availability of human translators. When a new dataset needs to be created for a low-resource language or a specialized domain, the annotation budget should be used efficiently and any sentences chosen for translation should be of high quality.


In this shared task, we ask participants to come up with ways in which such examples can be identified for a target language without any existing data. Specifically, given a parallel corpus between high-resource languages, the goal is to choose a good subset of the high-resource corpus to manually be translated into the low-resource language, in order to obtain a good machine translation system. The shared task winner will be the team whose instances result in the best final system after training.

TASK SETUP

Data

Baselines

Evaluation

Submission format

Permitted models

SUBMISSION


If you are interested in participating in this shared task, please 


System Description Papers:

IMPORTANT DATES