Detailed statistics for each subtask can be found on the corpus statistics page.
The Microorganism entities were assigned taxon identifiers from the NCBI Taxonomy as available the 2 February 2019. We provide the downloadable archive as it was provided by the NCBI at that date, and a list of valid identifiers for Microorganism entities.
Note that the NCBI does not provide old versions of the taxonomy. The version currently available version may contain additional identifiers that are not valid in the BB corpus.
The Habitat and Phenotype entities were assigned concept identifiers from the OntoBiotope ontology. We provide the version which has been to annotate the BB corpus.
Get the evaluation software from our GitHub repository.
The annotation guidelines for the BB task can be found here.
We provide here some resources that may be helpful to participants.
BB corpus processed with NLP tools.
Word embeddings trained on 2.8 million PubMed abstracts related to microorganisms.