decorator

Overview

decorator is a simple Java command line tool to decorate phylogentic trees with various data fields (such as sequence accessors, taxonomic scientific names) and write them in phyloXML format.
It can read trees in the following formats:

It is implemented in Java as part of the forester libraries.

A similar, but more limited, tool is the phyloxml converter (newick to phyloxml): phyloxml_converter

decorator can be used together with rid.


Download


Usage

java -cp path\to\forester.jar org.forester.application.decorator -table | -f=<c> <phylogenies infile> <mapping table file> <phylogenies outfile>

Options:

 -table : table instead of one to one map (-f=<c>)

 -p     : picky, fails if node name not found in mapping table

 -pn=<s>: name for the phylogeny

 -pi=<s>: identifier for the phylogeny (in the form provider:value)

 -pd=<s>: description for phylogenies



advanced options, only available if -table is not used:


 -f=<c> : field to be replaced: n : node name

                                a : sequence annotation description

                                d : domain structure

                                c : taxonomy code

                                sn: taxonomy scientific name

                                s : sequence name

                                m : molecular sequence

 -k=<n> : key column in mapping table (0 based),

          names of the node to be decorated - default is 0

 -v=<n> : value column in mapping table (0 based),

          data which with to decorate - default is 1

 -sn    : to extract bracketed scientific names, e.g. [Nematostella vectensis]

 -tc    : to extract bracketed taxonomic codes, e.g. [NEMVE]

 -s=<c> : column separator in mapping file, default is tab

 -c     : cut name after first space (only for -f=n)

 -t     : trim node name to be replaced after tilde

 -mp    : to midpoint-root the tree

 -or    : to order tree branches

 -ve    : verbose




Example:
"java -cp \soft\forester.jar org.forester.application.decorator -table my_simple_tree.nh my_map.txt decorated_tree.xml"

tags for mapping table:

  • "NODE_NAME:"
  • "TAXONOMY_CODE:"
  • "TAXONOMY_ID:"
  • "TAXONOMY_ID_PROVIDER:"
  • "TAXONOMY_SN:" (scientific name)
  • "TAXONOMY_CN:" (common name)
  • "TAXONOMY_SYN:" (synonym)
  • "SEQ_SYMBOL:"
  • "SEQ_ACCESSION:"
  • "SEQ_ACCESSION_SOURCE:"
  • "SEQ_MOL_SEQ:" (the actual sequence)
  • "SEQ_NAME:"
  • "SEQ_ANNOTATION_DESC:" (annotations for a sequence, free text)
  • "SEQ_ANNOTATION_REF:" (annotation references for a sequence, e.g. "GO:123456")

TAG:value pairs have to be separated by tabs.
The content of the first column has to be a node name matching a node name in the phylogeny to be decorated.
These tags correspond to phyloXML elements [phyloXML documentation]).

mapping table example row (tab separated):

"1 TAXONOMY_CODE:BACTN TAXONOMY_ID:226186 TAXONOMY_ID_PROVIDER:ncbi TAXONOMY_SN:Bacteroides thetaiotaomicron SEQ_ACCESSION:29341016 SEQ_ACCESSION_SOURCE:gi SEQ_SYMBOL:BT3701 SEQ_NAME:SusD"

"1" is the node name matching a node name in the phylogeny to be decorated.



last updated: 2017-09-08