WNGT 2019

DGT Task: Document-level Generation and Translation Shared Task

Clarifications and Update on Submission Deadline:

We extend the deadline for system submission to 8/29, 23:59:59 (UTC -12, "anywhere on earth"). System description deadline remains the same.

We would also like to clarify frequently asked questions regarding the submissions:

  • For all 6 tracks, participants must submit the model output on the test split of RotoWire English-German dataset, consisting of 241 game summaries in total. We do not have a separate blind test set.
  • System outputs should be tokenized. If you have a detokenized output, you can use our tokenizer to re-produce the same tokenization as our dataset.


Resource constraints are clarified below. However, you can still make submissions if you do not follow the constraints, but must notify us on submission.

  • 1 & 2. NLG (Data -> En, Data -> De)
    • RotoWire English-German dataset (train / development sets)
    • The full RotoWire (English) dataset (train / development sets)
    • Any of the monolingual resources specified below
  • 3 & 4. MT (En <-> De)
    • RotoWire English-German dataset (train / development sets)
    • Any parallel data allowable by the WMT 2019 English-German news task
    • Any of the monolingual resources specified below
  • 5 & 6. MT+NLG ([Data+En] -> De, [Data+De] -> En)
    • RotoWire English-German dataset (train / development sets)
    • The full RotoWire (English) dataset (train / development sets)
    • Any parallel data allowable by the WMT 2019 English-German news task
    • Any of the monolingual resources specified below

The WNGT 2019 DGT shared task on "Document-Level Generation and Translation” considers generating textual documents from either structured data or documents in another language or both. We plan for this to be both a task that pushes forward document-level generation technology and also a way to compare and contrast methods for generating from different types of inputs.

TRACKS

There will be three types of systems that we will compare in the shared task:

  • NLG Track: Systems that take in structured data and output text in the target language.
  • MT Track: Systems that take in the text in a source language and outputs text in the target language.
  • NLG+MT Track: Systems that take in structured data and text in the source language and output text in the target language.


In addition, the data, listed below, is English-German, and accordingly, there will be two target languages:

  • English
  • German

This results in a total of 6 tracks.

ROTOWIRE ENGLISH-GERMAN DATASET

The dataset that we use is a subset of the RotoWire dataset [1], a dataset of basketball-related articles along with information about the basketball game in structured data. The original RotoWire dataset is an English dataset that has been used for data-to-text natural language generation, and we have had a portion of this dataset manually translated into German. Specifically, the statistics are below:


DATASET DESCRIPTION

The RotoWire English-German dataset (v1.5) comes with two formats where both are split into identical train / development / test sets:

    • original: JSON format. Contains all the statistics from the original dataset and the German summaries.
    • plaintxt: Parallel texts between English and German. Files are separated according to the ID of documents. Each file consists of one sentence pair per line.

The original additionally stores the following new fields compared to the original RotoWire dataset:

    • id: ID of a document.
    • summary_*: The tokenized English and German summaries, no sentence boundaries.
    • sentence_end_index_*: List of indices pointing to the ends of sentences for English and German summaries.

* can either be en or de.

NEW: All texts are tokenized. We provide participants with the tokenizer which was used to process the dataset. This can be useful when incorporating external resources described below by making sure that the tokenization is consistent across datasets.

Notably, the English-German training dataset is small (much smaller than the full English dataset listed below), reflecting the resource constraints that we will encounter when trying to apply these systems to new languages. Because of this, you are further allowed to use the resources below.


MT+NLG Resources (usable in all tracks):


NLG Resources (usable in the NLG track and MT+NLG track):


MT Resources (usable in the MT and MT+NLG track):


Monolingual Resources (usable in all tracks):


If there are any additional resources you would like to see added, please contact the organizers by the “resource addition cutoff date” listed below.


EVALUATION

There will be a baseline system trained based on OpenNMT as described in [1].

Systems will be evaluated on the test split of RotoWire English-German dataset according to standard automatic measures, at least BLEU for the MT track, ROUGE and BLEU for the NLG and NLG+MT tracks, and content-oriented metrics (Content Selection, Relation Generation and Content Ordering [1]) for the (monolingual) NLG track. In addition, we are hoping, but not guaranteeing, that some degree of human evaluation on the results will be performed. Suggestions for evaluation methods are also very welcome.


SUBMITTING RESULTS

Helper tools can be downloaded here.

1. Preparing the format

For all the tracks, we ask the participants to save the generated results in a similar format to the original dataset. Specifically, a submission file should be a single JSON file which contains a list of records with the following fields:

    • id: ID of each document.
    • summary: Word-tokenized generated summary.

For example, a valid submission file would look like below:

[
    {"id": "02_24_16-Cavaliers-Hornets-TheEasternConference-leadingClevelandCavaliers",
     "summary": ["Die", "in", "der", ...]},
    {"id": "01_01_16-Knicks-Bulls-TheChicagoBulls(19",
     "summary": ["Die", "Chicago", "Bulls", ...]},
    {"id": "11_07_16-Pelicans-Warriors-AnthonyDaviscontinuestobe",
     "summary": ["Anthony", "Davis", "ist", ...]},
    {"id": "04_01_16-Cavaliers-Hawks-Inwhatwasahistoric",
     "summary": ["In", "einer", "historischen", ...]},
    {"id": "01_07_17-Thunder-Nuggets-RussellWestbrookrecordedyetanother",
     "summary": ["Russell", "Westbrook", "verzeichnete", ...]},
    ...
]

Note that indentation like the example above is not needed for the submission file.

For MT track, we provide a script which converts sentence-by-sentence plaintext outputs into the specified format. Download the helper tools and run the script as follows:

$ python plain2json.py --source-dir /path/to/translations --target-json output.json

where each file in /path/to/translations directory should have one target language sentence per line.

2. Validate the submission file

Download the helper tools and run the validator as follows:

$ python validate_outputs.py /path/to/your/submission/file 

Please fix the errors if prompted.

3. Submit

Please follow here to proceed with your submission. You will also need to submit a system description, and you can find more information about this on the call for papers page.


IMPORTANT DATES

  • April 8, 2019: Task announcement, data release
  • August 12, 2019: Resource addition cutoff
  • August 29, 2019: Extended System results due (23:59, UTC-12)
  • September 2, 2019: System descriptions due
  • September 16, 2019: System description feedback provided
  • September 30, 2019: Camera-ready system descriptions due
  • November 4, 2019: Presentation at the workshop


ORGANIZERS

The following people are responsible for organizing the task.


Please feel free to contact at any time by contacting wngt2019-organizers@googlegroups.com.


REFERENCES

[1] Sam Wiseman, Stuart Shieber and Alexander Rush. Challenges in Data-to-Document Generation. EMNLP 2017.