decorator

Overview

decorator is a simple Java command line tool to decorate phylogentic trees with various data fields (such as sequence accessors, taxonomic scientific names) and write them in phyloXML format.
It can read trees in the following formats:

It is implemented in Java as part of the forester libraries.
A similar, but more limited, tool is the phyloxml converter (newick to phyloxml): phyloxml_converter

Download

» forester.jar

Source code is available through Google code at: http://code.google.com/p/forester/

Usage

java -cp path\to\forester.jar org.forester.application.decorator -table | -f=<c> <phylogenies infile> <mapping table file> <phylogenies outfile>

Options:

-table : table instead of one to one map (-f=), see below for tags
-r=<n> : allow to remove up to n characters from the end of the names in phylogenies infile if not found (in map) otherwise
-p     : for picky, fails if node name not found in mapping table, default is off
-pn=<s>: name for the phylogeny
-pi=<s>: identifier for the phylogeny (in the form provider:value)
-pd=<s>: description for phylogenies


Advanced options, only available if -table is not used:

-f=<c>: field to be replaced:
        n : node name
        a : sequence annotation description
        d : domain structure
        c : taxonomy code
        sn: taxonomy scientific name
        s : sequence name
-k=<n>: key column in mapping table (0 based), names of the node to be decorated - default is 0
-v=<n>: value column in mapping table (0 based), data which with to decorate - default is 1
-sn   : to extract bracketed scientific names
-s=<c>: column separator in mapping file, default is ":"
-x    : process name "intelligently" (only for -f=n)
-xs   : process name "intelligently" and process information after "similar to" (only for -f=n)
-c    : cut name after first space (only for -f=n)


Example:
"java -cp \soft\forester.jar org.forester.application.decorator -table my_simple_tree.nh my_map.txt decorated_tree.xml"

tags for mapping table:

  • "NODE_NAME:"
  • "TAXONOMY_CODE:"
  • "TAXONOMY_ID:"
  • "TAXONOMY_ID_PROVIDER:"
  • "TAXONOMY_SN:" (scientific name)
  • "TAXONOMY_CN:" (common name)
  • "TAXONOMY_SYN:" (synonym)
  • "SEQ_SYMBOL:"
  • "SEQ_ACCESSION:"
  • "SEQ_ACCESSION_SOURCE:"
  • "SEQ_MOL_SEQ:" (the actual sequence)
  • "SEQ_NAME:"
  • "SEQ_ANNOTATION_DESC:" (annotations for a sequence, free text)
  • "SEQ_ANNOTATION_REF:" (annotation references for a sequence, e.g. "GO:123456")

TAG:value pairs have to be separated by tabs.
The content of the first column has to be a node name matching a node name in the phylogeny to be decorated.
These tags correspond to phyloXML elements [phyloXML documentation]).

mapping table example row (tab separated):

"1 TAXONOMY_CODE:BACTN TAXONOMY_ID:226186 TAXONOMY_ID_PROVIDER:ncbi TAXONOMY_SN:Bacteroides thetaiotaomicron SEQ_ACCESSION:29341016 SEQ_ACCESSION_SOURCE:gi SEQ_SYMBOL:BT3701 SEQ_NAME:SusD"

"1" is the node name matching a node name in the phylogeny to be decorated.