Bacteria Gene Renaming [RENAME]

Online submission closed. Thank you very much for your participation!

The Bacteria Gene Renaming (RENAME) task is a supporting task in the BioNLP Shared Task 2011 (BioNLP-ST'11).

The task consists in extracting gene renaming acts and gene synonymy reminders in scientific texts about bacteria. The history of bacterial gene naming has led to drastic amounts of homonymies and synonymies which are often missing in databases (or worse, erroneous).

A correct and complete gene synonym table is crucial to systemic biology studies because it allows to make comprehensive syntheses of gene functions and their participation to different metabolic pathways. Indeed this information can save a lot of bibliographic research time as well as experimental resources by avoiding to re-run experiments on genes that were known as a different name.

The renaming corpus is a set of 1836 PubMed references of bacterial genetic and genomic studies, including title and abstract. The corpus was annotated by a joint effort of the MIG Laboratory at the Institut National de Recherche Agronomique (INRA) and the Institut de l'Information Scientifique et Technique (INIST).

Task Definition


All gene and protein names have been annotated as text-bound entities of type Gene. Genes and proteins have not been distinguished because of the high frequency of metonymy in renaming events.


The only type of event is Renaming where both arguments are of type Gene. However the event is directed since the former and the new names are distinguished. The task consists in predicting Renaming events for texts with genes given as input.


UTF-8 encoded with LF line terminators (unix).


The evaluation of the rename task will be given in terms of recall, precision and F-score of renaming relations. Two set of scores will be given: the first set is computed by enforcing strict direction of renaming relations, the second set is computed with relaxed direction. Since the relaxed score takes into account renaming relations even if the arguments are inverted, it will be necessarily greater or equal than the strict score.
The final participant score will be the relaxed score, the strict score is given for the sake of information. Note that if the output contains the same relation several times, then the precision will be penalized.

There are two main reasons for keeping the relaxed scores instead of the strict scores:
  1. The motivation of this task is to keep bacteria genes synonyms tables up to date. The synonymy relation is reflexive and the choic of a canonical name is not necessarily bound to be the oldest or the oldest name: it is a bacteriology community choice.
  2. During the annotation of the reference references, it appeared that the direction was not always definitely decidable even for a human reader. Thus it would have been unfair to evaluate systems on the basis of unsure information.
The evaluation sofware is available for download below at the attachments section.


The REN task is completed. Final submissions were received from three teams, and the results are summarized in the following table:

 Team Recall Precision F-score
 University of Turku
79.6 95.9 87.0
 Concordia University
 65.9 74.4 69.9
 INRA 73.9 57.0 64.4

The primary performance metric is overall F1-score for relations with relaxed argument order, shown in bold in the table above. Detailed results will be announced shortly.

Robert Bossy,
Nov 5, 2010, 3:51 AM