Dataset
BB-norm: Entity normalization
BB-rel: Relation extraction
BB-kb: Entity normalization and relation extraction
BB-norm+ner: Entity recognition and normalization
BB-rel+ner: Entity recognition and relation extraction
BB-kb+ner: Entity recognition and normalization and relation extraction
Corpus statistics
Detailed statistics for each subtask can be found on the corpus statistics page.
Normalization resources
NCBI Taxonomy
The Microorganism entities were assigned taxon identifiers from the NCBI Taxonomy as available the 2 February 2019. We provide the downloadable archive as it was provided by the NCBI at that date, and a list of valid identifiers for Microorganism entities.
Note that the NCBI does not provide old versions of the taxonomy. The version currently available version may contain additional identifiers that are not valid in the BB corpus.
OntoBiotope ontology
The Habitat and Phenotype entities were assigned concept identifiers from the OntoBiotope ontology. We provide the version which has been to annotate the BB corpus.
Evaluation software
Get the evaluation software from our GitHub repository.
Annotation guidelines
The annotation guidelines for the BB task can be found here.
Supporting resources
We provide here some resources that may be helpful to participants.
BB corpus processed with NLP tools.
Word embeddings trained on 2.8 million PubMed abstracts related to microorganisms.