NHX - New Hampshire eXtended [version 2.0]

Copyright (C) 2014 by Christian M. Zmasek. Written by Christian M. Zmasek. Permission is granted to copy this document provided that this copyright notice is not removed and that the contents of this document are not altered in any way.

This document is available at: https://sites.google.com/site/cmzmasek/home/software/forester/nhx

Notice. This (version 2.0) is expected to be the final version of NHX.
It is recommended to use phyloXML (see: www.phyloxml.org) instead of NHX.

NHX is a format for describing annotated phylogenies. NHX is based on the New Hampshire (NH) standard (also called "Newick tree format"). It has the following extensions (compared to NH as used in the PHYLIP package):

  • it introduces tags to associate various data fields with a node of a phylogenetic tree
  • both internal and external nodes can be tagged
  • arbitrary number of children per node
  • the tree is assumed to be rooted if the deepest node is a bifurcation
  • the order of the tags does not matter
  • the length of all character string based data is unlimited (e.g. name, species)
  • Comments between '[' and ']' are removed (unless the opening bracket is followed by "&&NHX")

In order to remain compatible with the NEXUS format, all fields except name and branch length (in other words, all fields eXtending NH) must be wrapped by:

  • [&&NHX
  • ]

The following characters can not be part of names: ( ) [ ] , : ; as well as white space.

An example of a (rooted) phylogeny in NHX

(((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]):0.05[&&NHX:S=primates:D=Y:B=100], ADHY:0.1[&&NHX:S=nematode],ADHX:0.12[&&NHX:S=insect]):0.1[&&NHX:S=metazoa:D=N], (ADH4:0.09[&&NHX:S=yeast],ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast],ADH1:0.11[&&NHX:S=yeast]):0.1 [&&NHX:S=Fungi])[&&NHX:D=N];

NHX version 2.0 elements

Element Type Description Corresponding phyloXML element (parent element in parentheses)  
phyloXML example
no tag string name of this node/clade (MUST BE FIRST, IF ASSIGNED) <name> (<clade>) <name>Human ADH1</name>
: decimal branch length to parent node (MUST BE SECOND, IF ASSIGNED) <branch_length> (<clade>) <branch_length>0.102</branch_length>
:GN= string gene name <name> (<sequence>) <name>alcohol dehydrogenase</name>
:AC= string sequence accession <accession> (<sequence>) <accession source="ncbi">AAB80874</accession>
:B= decimal confidence value for parent branch <confidence> (<clade>) <confidence type="bootstrap">100</confidence>
:D= 'T', 'F', or '?' 'T' if this node represents a duplication event - 'F' if this node represents a speciation event, '?' if this node represents an unknown event <events> (<clade>) <events>
    <duplications>1</duplications>
</events>
:S= string species name of the species/phylum at this node <scientific_name (<taxonomy>) <scientific_name>Bacillus subtilis</scientific_name>
:T= integer taxonomy ID of the species/phylum at this node <id> (<taxonomy>) <id provider="ncbi">1423</id>

References

phyloXML website: http://www.phyloxml.org/

phyloXML reference: Han MV and Zmasek CM (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 10:356

New Hampshire format: http://evolution.genetics.washington.edu/phylip/newicktree.html

PHYLIP: http://evolution.genetics.washington.edu/phylip.html

NEXUS format: Maddison DR, Swofford DL and Maddison WP (1997) NEXUS: an extensible file format for systematic information. Systematic Biology 46 590-621