decorator
decorator
Overview
decorator is a simple Java command line tool to decorate phylogentic trees with various data fields (such as sequence accessors, taxonomic scientific names) and write them in phyloXML format.
It can read trees in the following formats:
New Hampshire/Newick: produced by most major phylogenetic analysis software (e.g. PHYLIP)
NHX - New Hampshire Extended (deprecated): our extension of the New Hampshire format
ToL Response XML Format: the format currently used by the Tree of Life project webservices
It is implemented in Java as part of the forester libraries.
A similar, but more limited, tool is the phyloxml converter (newick to phyloxml): phyloxml_converter
decorator can be used together with rid.
Download
Most current version (might be unstable): forester.jar
Source code is available at GitHub: https://github.com/cmzmasek/forester
Usage
java -cp path\to\forester.jar org.forester.application.decorator -table | -f=<c> <phylogenies infile> <mapping table file> <phylogenies outfile>
Options:
-table : table instead of one to one map (-f=<c>)
-p : picky, fails if node name not found in mapping table
-pn=<s>: name for the phylogeny
-pi=<s>: identifier for the phylogeny (in the form provider:value)
-pd=<s>: description for phylogenies
advanced options, only available if -table is not used:
-f=<c> : field to be replaced: n : node name
a : sequence annotation description
d : domain structure
c : taxonomy code
sn: taxonomy scientific name
s : sequence name
m : molecular sequence
-k=<n> : key column in mapping table (0 based),
names of the node to be decorated - default is 0
-v=<n> : value column in mapping table (0 based),
data which with to decorate - default is 1
-sn : to extract bracketed scientific names, e.g. [Nematostella vectensis]
-tc : to extract bracketed taxonomic codes, e.g. [NEMVE]
-s=<c> : column separator in mapping file, default is tab
-c : cut name after first space (only for -f=n)
-t : trim node name to be replaced after tilde
-mp : to midpoint-root the tree
-or : to order tree branches
-ve : verbose
Example:
"java -cp \soft\forester.jar org.forester.application.decorator -table my_simple_tree.nh my_map.txt decorated_tree.xml"
tags for mapping table:
"NODE_NAME:"
"TAXONOMY_CODE:"
"TAXONOMY_ID:"
"TAXONOMY_ID_PROVIDER:"
"TAXONOMY_SN:" (scientific name)
"TAXONOMY_CN:" (common name)
"TAXONOMY_SYN:" (synonym)
"SEQ_SYMBOL:"
"SEQ_ACCESSION:"
"SEQ_ACCESSION_SOURCE:"
"SEQ_MOL_SEQ:" (the actual sequence)
"SEQ_NAME:"
"SEQ_ANNOTATION_DESC:" (annotations for a sequence, free text)
"SEQ_ANNOTATION_REF:" (annotation references for a sequence, e.g. "GO:123456")
TAG:value pairs have to be separated by tabs.
The content of the first column has to be a node name matching a node name in the phylogeny to be decorated.
These tags correspond to phyloXML elements [phyloXML documentation]).
mapping table example row (tab separated):
"1 TAXONOMY_CODE:BACTN TAXONOMY_ID:226186 TAXONOMY_ID_PROVIDER:ncbi TAXONOMY_SN:Bacteroides thetaiotaomicron SEQ_ACCESSION:29341016 SEQ_ACCESSION_SOURCE:gi SEQ_SYMBOL:BT3701 SEQ_NAME:SusD"
"1" is the node name matching a node name in the phylogeny to be decorated.
last updated: 2017-09-08