CRAFT Structural Annotation Task

Developing dependency parsers for the full text biomedical literature

Introduction

For the structural annotation task, participants will attempt to automatically parse full-length biomedical journal articles of the CRAFT Corpus into dependency structures. It is the hope of the task organizers that participants take advantage of the integrated nature of the CRAFT syntactic and semantic annotations, and in the case of the CRAFT-SA task, attempt to leverage the semantic annotations to help improve dependency parsing.

It should be noted that the dependency structures that serve as development and evaluation data are not themselves manually curated, but they are direct translations from the constituency parses of each sentence that were manually curated. See details on the translation procedure below.

Description of the guidelines used in the constituency parse, as well as information on part-of-speech tag sets is available here.

The Task

The Structural Annotation task requires the user to automatically generate dependency parses for sentences in full-text scientific articles.

Description of Data

Development Data

The training data for this task consist of the already publicly released dependency parses for sentences in the 67 articles of the CRAFT Corpus v3.1.3 in the CoNLL-U format. The dependency parses are automatically generated from the manually curated treebank data in CRAFT. This process takes advantage of the NLP4J library. A wrapper for this code is made available with the CRAFT distribution. If you are interested, please see the following wiki page.

Evaluation Data

The testing data for this task consists of dependency parses for sentences in 30 articles of the corpus that have not yet been publicly released. These articles have been annotated in exactly the same ways as the publicly released document set.

Evaluation Metrics

UPDATE: Thanks to help from Manuel Ciosici and Sampo Pyysalo, the SA Task will proceed as originally intended. Properly formatted CoNLL-U files are now available as part of the CRAFT distribution and will be used for the CRAFT-SA Task. As originally planned, the CoNLL-U 2018 evaluation script will be utilized for the CRAFT SA Task evaluation. Again, apologies for any inconvenience these changes have caused.

Evaluation of the CRAFT-SA task will use the CoNLL-U 2018 evaluation script. Dependency parser performance will be evaluated on three metrics:

Labeled attachment score (LAS)
Morphology-aware labeled attachment score (MLAS)
Bi-lexical dependency score (BLEX)

The evaluation platform is now available in the CRAFT Shared Task GitHub repository. See the wiki for installation and usage instructions.

Input/Output

For final submissions, users will be provided a plain text version of each document. Submissions of results for the CRAFT-SA task must use the CoNLL-U file format. Submissions should include one file per document.

Google Sites

Report abuse