Dataset

BB-norm: Entity normalization

BB-rel: Relation extraction

BB-kb: Entity normalization and relation extraction

BB-norm+ner: Entity recognition and normalization

BB-rel+ner: Entity recognition and relation extraction

BB-kb+ner: Entity recognition and normalization and relation extraction

Corpus statistics

Detailed statistics for each subtask can be found on the corpus statistics page.

Normalization resources

NCBI Taxonomy

The Microorganism entities were assigned taxon identifiers from the NCBI Taxonomy as available the 2 February 2019. We provide the downloadable archive as it was provided by the NCBI at that date, and a list of valid identifiers for Microorganism entities.

Note that the NCBI does not provide old versions of the taxonomy. The version currently available version may contain additional identifiers that are not valid in the BB corpus.

OntoBiotope ontology

The Habitat and Phenotype entities were assigned concept identifiers from the OntoBiotope ontology. We provide the version which has been to annotate the BB corpus.

Evaluation software

Get the evaluation software from our GitHub repository.

Annotation guidelines

The annotation guidelines for the BB task can be found here.

Supporting resources

We provide here some resources that may be helpful to participants.

  • BB corpus processed with NLP tools.

  • Word embeddings trained on 2.8 million PubMed abstracts related to microorganisms.