CRAFT-CA Additional Resources

Ontology Files

Packaged along with the annotation sets for each ontology are the two OBO-format ontologies (.obo files) that were used for concept annotation of the corpus. (Documentation for the latest version of the OBO format is available at https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html.) One is the ontology as distributed by its developers for the core annotation set (e.g., CHEBI.obo for the core CHEBI annotation set); the other is this ontology plus the OBO stanzas for that ontology’s extension classes that we have created, to be used for the core+extensions annotation set (e.g., CHEBI+extensions.obo for the CHEBI+extensions annotation set). (The ontologies provided for the GO_MF annotations are somewhat different; see the GO_MF-specific comments in the link in the Ontology-Specific Notes and Recommendations section below.) Both are valid OBO-format files. Note that obsolete classes have not been used for any of the concept annotations, so these should be ignored when parsing an ontology file to extract information for dictionary construction by looking for “is_obsolete: true” lines within the OBO stanzas of the classes. Additional filtering is recommended when parsing the ontology files for the GO_BP, GO_CC, MOP, NCBITaxon and PR annotation sets, detailed in their respective ontology-specific comment sections.

Unused Class Files

For each core annotation set, there is a basic text file named unused_classes_for_X_annotations.txt, where X is the namespace prefix for the corresponding ontology. This is simply a text file of IDs of certain classes of the given ontology that were not used at all in the corresponding core annotation set. This is not an exhaustive listing of every class of the ontology that was not used in these annotations; rather, it lists certain classes that were not used either because we thought they were difficult to reliably use for concept annotation and/or for which extension classes were alternately created and used. For this task, since none of the classes in these files have been used for annotations within their corresponding core annotation sets, any annotations created with these classes (and kept) will count as false positives compared to the gold-standard set. If using the corresponding ontology for dictionary construction, the participant should either not make any annotations with this file’s classes (e.g., by ignoring these classes within the ontology when constructing the dictionary) or delete any such annotations created prior to submitting results to be compared to the gold-standard set. (If the participant is instead learning from the training annotation set, these annotation classes are never used in the annotation set, so she will not have to make use of this file as such.)

The corresponding file that is important to use when working with a core+extensions annotation set is named unused_classes_and_substitute_extension_classes_for_X+extensions_annotations.txt, where X is again one of the ontology namespace prefixes. This tab-delimited file contains all of the classes of the corresponding unused_classes_for_X_annotations.txt file, and for the CHEBI, GO_MF, PR, and SO annotations, many more automatically generated extension classes as well (see comments in their corresponding subsections). Like the corresponding files for core annotation sets, each of these files is a listing of certain OBO classes that were not used at all in the core+extensions annotations, not an exhaustive listing of every OBO class that was not used. In addition to this listing of unused classes, the overwhelming majority of these classes are mapped to extension classes that were used instead. The very small number of these classes that have no mapping are simply OBO classes that were not used at all and for which no extension classes were used instead; the participant can either not make any annotations with such classes or delete any such annotations created prior to submission. For the small number of OBO classes mapped to more than one extension class, the OBO class was not used but several different extension classes were instead used in different contexts; for such a class, the participant can attempt to determine which one of the mapped extension classes should be used to replace a given annotation made with the unused class, or he can simply delete any such annotation, thus at least guaranteeing the removal of a false positive. However, the overwhelming majority of classes listed in these files are mapped to one and only one extension class; for such a class, the participant should check to replace the annotations made with this class with annotations made with its corresponding mapped extension class. Note that in some cases, an automatic annotator may match a piece of text both to an OBO class and to an extension class to which the OBO class is mapped. In these cases, the participant should be careful not to duplicate the extension class annotation; that is, in such a case, the OBO class annotation should be deleted rather than replaced with an annotation with the extension class, as this would result in a duplicated annotation with the extension class.

It should be emphasized that in the CHEBI, PR, and SO core+extensions annotation sets, the corresponding programmatically generated extension classes are used for annotation rather than their corresponding classes in the ontologies provided within the same directories. For example, the programmatically generated CHEBI_EXT:23888 class is used to annotate drug mentions rather than the CHEBI:23888 class within the ontology provided in the same directory; thus, a user would not find CHEBI_EXT:23888 within the accompanying ontology (though he would find CHEBI:23888). This was done so as to allow the participant to use the original ontology for dictionary construction (with the exception of the ontology provided for the GO_MF core+extensions annotation set, in which case we assessed that editing the labels and synonyms of the classes of that ontology was worthwhile). With regard to this, if the ontology is used for dictionary construction, all that needs to be done is for the participant to make use of the corresponding unused_classes_and_substitute_extension_classes_for_X+extensions_annotations.txt file in a post-processing step as described in the preceding paragraph.

Ontology-Specific Notes and Recommendations

The participant is advised to read through the ontology-specific comments and recommendations compiled for the v3 version of the CRAFT concept annotations at https://github.com/UCDenver-ccp/CRAFT/tree/master/ontology-concepts#ontology-specific-notes-and-recommendations.