Project Software Deliverables


CharaParser is a fine-grained semantic parsing software for morphological descriptions of any organisms that are composed in a semi-structured manner. It has been used to parser plants, ants, fish, and descriptions of invertebrate fossils. It does less well with "diagnosis" paragraphs where multiple organisms are compared.

The algorithms used in CharaParser are described in

  1. Cui, H., Boufford, D., & Selden, P. (2010). Semantic Annotation of Biosystematics Literature without Training Examples. Journal of American Society of Information Science and Technology. 61 (3): 522-542.
  2. Cui, H. (2012). CharaParser for fine-grained semantic annotation of organism morphological descriptions. Journal of American Society of Information Science and Technology. 63(4) DOI: 10.1002/asi.22618

The Input to CharaParser is morphological descriptions in simple XML format with to-be-parsed text enclosed in the <description> tag and taxon name enclosed in the <taxon_name> tag, like the example shown below. CharaParser also takes TaxonX files (with <tax:div type="description"> elements) as input.

 input (XML)
    <taxon_name>Eriogonum caespitosum Nutt. /taxon_name>
    <description>Plants perennial, matted.</description>

    <taxon_name>Eriogonum caespitosum Nutt.</taxon_name>
        <statement id = "s1">
              <structure id="o1" name="whole_organism">
                    <character name="lift_style" value="plant" />
                    <character name="life_style" value="perennial" />
                    <character name="habit" value="matted" />

CharaParser Installation Instructions

CharaParser Installation Instructions can be found here.

CharaParser Demo 

A video introduction to CharaParser can be found at

CharaParser Test Run

Follow the instruction to run CharaParser using sample datasets provided on the installation page. CharaParser can run independently or work with OTO (described below). If running with OTO, contact hongcui at email dot arizona dot edu to obtain access to OTO server.

CharaParser Source Code

CharaParser souce code is at Check out both parsing-gui and unsupervised projects. 

OTO: Ontology Term Organizer

OTO is a web-based application that allow domain experts to group and sort terms. CharaParser can optionally output terms to OTO and use expert-reviewed terms from OTO in its parsing process.


OTO Demos

A video introduction to OTO can be found at . Short demos that show how to use OTO can be found in the Instruction section of OTO website after login. 

OTO Source Code

OTO souce code is at, check out Web-based-CV-Interface. .