Task Description

Information representation scheme

The representation scheme of the BB task contains four entity types:

  • Microorganism
  • Habitat
  • Geographical
  • Phenotype

and two relation types:

  • Lives_in relations which link a Microorganism entity to a location (either a Habitat or a Geographical entity)
  • Exhibits relations which link Microorganism entity to a Phenotype entity.

In addition, Microorganisms are normalized to taxa from the NCBI taxonomy, and Habitat and Phenotype entities normalized to concepts from the OntoBiotope ontology.

Annotation example

Corpus

The task corpus includes two types of documents:

  • PubMed references (titles and abstracts) related to microorganisms
  • Extracts from full-text articles related to microorganisms living in food products

The annotated corpus will be provided in the BioNLP-ST standoff annotation format.

Annotation guidelines can be found here.

BB Tasks and Evaluation

The BB task is composed of three subtasks. Each subtask has two modalities: one where entities are given as input, and one where entities are not be provided. Teams are free to participate in the subtask(s) of their choice.

1. Entity detection and normalization subtask (BB-norm and BB-norm+ner)

  • BB-norm: Normalization of Microorganism, Habitat and Phenotype entities with NCBI Taxonomy taxa (for the former) and OntoBiotope habitat concepts (for the last two). Entity annotations are provided.
  • BB-norm+ner: Recognition of Microorganism, Habitat and Phenotype entities and normalization with NCBI Taxonomy taxa and OntoBiotope habitat concepts.

The evaluation will focus on the accuracy of the predicted categories compared to gold reference. A concept distance measure has been designed in order to sanction over-generalization or over-specialization with a fair penalty. Note that if an entity has several categories, then it is a conjunction: all categories must be predicted.

For norm+ner, boundary accuracy will be factored in the evaluation since the inclusion or exclusion of modifiers can change the meaning and the normalization of phrases.

2. Entity and relation extraction subtask (BB-rel and BB-rel+ner)

  • BB-rel: Extraction of Lives_In relations between Microorganism, Habitat and Geographical entities, and of Exhibits relations between Microorganism and Phenotype entities. Entity annotations are provided.
  • BB-rel+ner: Recognition of Microorganism, Habitat, Geographical and Phenotype entities, and extraction of Lives_In and Exhibits relations

The evaluation measures will be Recall and Precision of predicted events against gold events.

For rel+ner, boundary accuracy will be factored in the evaluation.

3. Knowledge base extraction subtask (BB-kb and BB-kb+ner)

Participant systems are evaluated for their capacity to build a knowledge base from the corpus. The knowledge base is the set of Lives_in and Exhibits relations with the concepts of their Microorganism, Habitat and Phenotype arguments. The goal of the task is to measure how much of the information content of the corpus can be extracted automatically. It can be viewed as a combination of the first two subtasks, with results aggregated at the corpus level (i.e., not all occurrences need to be predicted).

  • BB-kb: Extraction of Lives_in and Exhibits relations between Microorganism, Habitat and Phenotype concepts at the corpus level. Entities annotations are provided.
  • BB-kb+ner: Extraction of Lives_in and Exhibits relations between Microorganism, Habitat and Phenotype concepts at the corpus level.

The evaluation measures will be Recall and Precision of predicted events against gold events.

For kb+ner, boundary accuracy will be factored in the evaluation.