Entity Relations Supporting Task (REL)

The Entity Relations (REL) task is a supporting task in the BioNLP Shared Task 2011.

The task concerns the detection of relations stated to hold between a gene or gene product and a related entity such as a protein domain or protein complex.

Task Results

The REL supporting task is completed. Final submissions were received from four teams, and the results are summarized in the following table (approximate entity boundary matching criteria):

 Team
 Recall Precision F-score
 University of Turku
 50.1
 68.0
 57.7
 VIB - Ghent University  47.5
 37.0
 41.6
 Concordia University
 24.4  46.9
 32.0
 University Of Science, VNU, HCMC
 15.7  23.3  18.7

The primary performance metric is overall F-score, shown in bold in the table above.

Detailed Results

University of Turku

------------------------------------------------------------------------------------
    Relation Class        gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
   Protein-Component       334 (  170)      245 (  168)    50.90    68.57    58.43
    Subunit-Complex        163 (   79)      118 (   79)    48.47    66.95    56.23
     ===[TOTAL]===         497 (  249)      363 (  247)    50.10    68.04    57.71
------------------------------------------------------------------------------------

VIB - Ghent University

------------------------------------------------------------------------------------
    Relation Class        gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
   Protein-Component       334 (  158)      427 (  156)    47.31    36.53    41.23
    Subunit-Complex        163 (   78)      202 (   77)    47.85    38.12    42.43
     ===[TOTAL]===         497 (  236)      629 (  233)    47.48    37.04    41.62
------------------------------------------------------------------------------------

Concordia University

------------------------------------------------------------------------------------
    Relation Class        gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
   Protein-Component       334 (   78)      146 (   76)    23.35    52.05    32.24
    Subunit-Complex        163 (   43)      108 (   43)    26.38    39.81    31.73
     ===[TOTAL]===         497 (  121)      254 (  119)    24.35    46.85    32.04
------------------------------------------------------------------------------------

University Of Science, VNU, HCMC

------------------------------------------------------------------------------------
    Relation Class        gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
   Protein-Component       334 (   70)      319 (   69)    20.96    21.63    21.29
    Subunit-Complex        163 (    8)       12 (    8)     4.91    66.67     9.14
     ===[TOTAL]===         497 (   78)      331 (   77)    15.69    23.26    18.74
------------------------------------------------------------------------------------

Task Definition

Entities

Similarly to many main tasks of the shared task, the supporting task provides as a starting point human-annotated gene and gene product entities, annotated as "Protein". The correct annotations for these entities are provided both for the training and test data.

Human-created gold annotation for the related entities will only be provided for the training data, and systems will need to detect the related entities as part of addressing the supporting task. However, the type of these entities does not need to be resolved; all non-Protein entities are annotated using an unspecified class "Entity".

Relations

Relations are binary and represented as typed, ordered entity pairs. All relations considered in the task involve exactly one Protein entity (given) and one other entity (detected by participating systems). The arguments and relation types are fixed so that the first argument (Arg1) is always a Protein and the second argument (Arg2) is always an Entity.

By contrast from the annotation of the primary tasks, the entity relations supporting task only involves relations holding between entities co-occurring within a single sentence.

The following table shows the relations considered in the supporting task.

 Relation type
 Arguments
 Subunit-Complex  Arg1:Protein, Arg2:Entity
 Protein-Component  Arg1:Protein, Arg2:Entity

Subunit-Complex is a Component-Object relation that holds between a protein complex and its subunits, individual proteins. The Protein-Component is a less specific Object-Component relation that holds between a gene/protein and its component, such as a protein domain or the promoter of a gene.

Data Format

The data format of the supporting task files is described on the file formats page.

Evaluation

Evaluation is relation-oriented and based on the standard precision, recall and F-score metrics. Relations output by participating systems are correct if the associated Entity matches an Entity in the gold annotation (using approximate boundary matching criteria) and a Relation of a type matching that output by the system is included in the gold relation data between the corresponding gold entities.

Note that only relations are evaluated; the accuracy with which systems detect (candidate) related entities will not be separately considered in the results.

Datasets

A small sample of annotations for the task is available below (attachment). The full training and development test data are available from the download page.
Č
ċ
BioNLP-ST_2011_Entity_Relations_sample_data.tar.gz
(5k)
Sampo Pyysalo,
Sep 14, 2010, 4:40 AM
Comments