phyloxml converter
phyloXML converter
Overview
phyloxml_converter is a simple Java command line tool to convert various phylogentic tree formats to phyloXML ("newick to phyloxml").
It can read trees in the following formats:
New Hampshire/Newick: produced by most major phylogenetic analysis software (e.g. PHYLIP, RAxML)
NHX - New Hampshire Extended (deprecated): our extension of the New Hampshire format
ToL Response XML Format: the format currently used by the Tree of Life project webservices
It is implemented in Java as part of the forester libraries.
A similar, but more flexible, tool is the phylogeny decorator: decorator
Download
Most current version (might be unstable): forester.jar
Source code is available at GitHub: https://github.com/cmzmasek/forester
Usage
java -cp path\to\forester.jar org.forester.application.phyloxml_converter -f=<field option> [options] <infile> <outfile>
field options:
nn: transfer name to node/clade name
tc: transfer name to taxonomy code
sn: transfer name to taxonomy scientific name
cn: transfer name to taxonomy common name
gn: transfer name to sequence name
sy: transfer name to sequence symbol
dummy: to convert NHX formatted trees to phyloXML
i1: transfer/split name to taxonomy uniprot identifier (split at underscore if "id_name" pattern, e.g. "817_SusD")
i2: transfer/split name to taxonomy uniprot identifier (split at underscore if "name_id" pattern, e.g. "SusD_817")
options:
-i : internal node names in NH or NHX tree are confidence values
-c=<conf>: confidence type (e.g. "bootstrap", default is "unknown")
-ru : replace all underscores with spaces
-m : midpoint reroot
-o : order subtrees
-xt : extract taxonomy to taxonomy code from "seqname_TAXON"-style names (cannot be used with the following field options: tc, cn, sn)
-xp : extract taxonomy to taxonomy code from Pfam ("seqname_TAXON/x-y") style names only (cannot be used with the following field options: tc, cn, sn)
-ni : no tree level indentation in phyloXML output
-iqs : ignore quotes and whitespace (e.g. "a b" becomes ab)