GENIA Event Extraction (GENIA)

The GENIA event extraction (GENIA) task is a main task in BioNLP Shared Task 2011 (BioNLP-ST '11).

For the GENIA task, the task definition remains the same as BioNLP Shared Task 2009 (BioNLP-ST'09). With the unchanged task definition, the purpose of running this task is to measure the progress of the community on the task. In order to avoid over-fitting to the evaluation data, additional training/evaluation data sets will be provided together with those for 2009 Shared Task. As the additional datasets will come from full text articles, the task includes generalization of the technology from abstracts only to full text articles.

Task Definition

Task definition remains the same as that for BioNLP-ST'09. So for details, please refer to the homepage of BioNLP-ST'09. Here, we provide abstract of the task definition.

Entities

The GENIA task aims at extracting events occurring upon genes or gene products, which are typed as "Protein" without differentiating genes from gene products. Other types of physical entities, e.g. cells, cell components, are not differentiated from each other, and their type is given as "Entity".

Events

The following table summarizes the events targeted in the task.

 Event Type
Core arguments
 Additional arguments
 Gene expression
 Theme(Protein)
 
 Transcription Theme(Protein)
 
 Protein catabolism
 Theme(Protein) 
 Phosphorylation
 Theme(Protein) Site(Entity)
 Localization Theme(Protein)
 AtLoc(Entity), ToLoc(Entity)
 Binding Theme(Protein) +
 Site(Entity) +
 Regulation Theme(Protein / Event), Cause(Protein / Event)
 Site(Entity), CSite(Entity)
 Positive regulation
 Theme(Protein / Event), Cause(Protein / Event) Site(Entity), CSite(Entity)
 Negative regulation
 Theme(Protein / Event), Cause(Protein / Event) Site(Entity), CSite(Entity)

The format "Arg(Type)" indicates that an event takes an argument "Arg" which should identify an entity of type "Type": for example, Localization takes one Theme of protein type.

Evaluation

The evaluation methods is this task is described here.

Corpus

In addition to the PubMed abstracts from the BioNLP-ST'09 data sets, five newly annotated PMC full paper articles are included in each of the training, development, and test sets (15 articles in total). Evaluation will be provided for each of the PubMed and PMC subsets. Note that five PMC full paper articles roughly amount to 150 PubMed abstracts which may not be sufficient for a separate run of training. We expect a kind of adaptation techniques to utilize the PubMed portion for the PMC portion to be useful.

Online Evaluation

Online evaluation available from the shared task homepage.

Final Evaluation Results

Final evaluation results are available: [whole][abstracts][full-papers]
Comments