The task consists in extracting gene renaming acts and gene synonymy reminders in scientific texts about bacteria. The history of bacterial gene naming has led to drastic amounts of homonymies and synonymies which are often missing in databases (or worse, erroneous).
A correct and complete gene synonym table is crucial to systemic biology studies because it allows to make comprehensive syntheses of gene functions and their participation to different metabolic pathways. Indeed this information can save a lot of bibliographic research time as well as experimental resources by avoiding to re-run experiments on genes that were known as a different name.
The renaming corpus is a set of 1836 PubMed references of bacterial genetic and genomic studies, including title and abstract. The corpus was annotated by a joint effort of the MIG Laboratory at the Institut National de Recherche Agronomique (INRA) and the Institut de l'Information Scientifique et Technique (INIST).
All gene and protein names have been annotated as text-bound entities of type Gene. Genes and proteins have not been distinguished because of the high frequency of metonymy in renaming events.
The only type of event is Renaming where both arguments are of type Gene. However the event is directed since the former and the new names are distinguished. The task consists in predicting Renaming events for texts with genes given as input.
UTF-8 encoded with LF line terminators (unix).
The final participant score will be the relaxed score, the strict score is given for the sake of information. Note that if the output contains the same relation several times, then the precision will be penalized.
There are two main reasons for keeping the relaxed scores instead of the strict scores:
The primary performance metric is overall F1-score for relations with relaxed argument order, shown in bold in the table above. Detailed results will be announced shortly.