cladinator

cladinator: clades within clades of annotated labels -- analysis of pplacer-type outputs

Purpose

To analyze pplacer-type output trees with hierarchically annotated nodes.

Download

Most current version (might be unstable): forester.jar
Source code is available at GitHub: https://github.com/cmzmasek/forester

Usage

% java -Xmx1024m -cp path/to/forester.jar org.forester.application.cladinator [options] <input tree(s) file> [output table file]

Options

-c=<double>: the minimal confidence value for "specific-hits" to be reported (default: 0.7)
-s=<separator>: the annotation-separator to be used (default: ".")
-m=<mapping table>: to map node names to appropriate annotations (tab-separated, two columns) (default: no mapping)
-x: to enable extra processing of annotations (e.g. "Q16611|A.1.1" becomes "A.1.1")
-xs=<separator>: the separator for extra annotations (default: "|")
-xk: to keep extra annotations (e.g. "Q16611|A.1.1" becomes "A.1.1.Q16611")
-S: special processing with pattern (e.g. "(\d+)([a-z]+)_.+" for changing "6q_EF42" to "6.q")
-rs: to remove the annotation-separator in the output (e.g. the ".")
-v: verbose
-Q: quiet (no output to console, for when used in a pipeline)
--q=<query pattern>: expert option: the regular expression pattern for the query (default: "_#\d+_M=(.+)" for pplacer output)

Input tree(s)

Must be in New Hampshire (Newick) or phyloXML format, with appropriate node labels (e.g. A.1.1.1, A.1.1.2, ...; unless a mapping table is being used) and query placements labels (e.g. pplacer-type labels of the form "Q_#0_M=0.4") as node/clade names (example). Generally, input trees are in ".sing.tre" files produced by pplacer/guppy.

In the case of phyloXML formatted trees, other data fields (such as sequence names, taxonomic names) are ignored.

Output table file

Output as tab-separated file, suitable for machine-parsing.

The tab-separated columns are (the first line has a "#" suffix and contains some status information):

1. Query name
2. Match type: "Matching Clades", "Matching Down-tree Bracketing Clades", or "Matching Up-tree Bracketing Clades"
3. Consensus match name ("?" if no consensus)
4. Score of consensus match (usually a probability between 0.0 and 1.0)
5. Sum of different query placements on reference tree (simplified, the larger this number the greater the uncertainty)
6. Number of external nodes in reference tree

Mapping table

The mapping table most contain two columns, separated by tabs. The first column is the identifier (node name) in the tree, and the second column is the desired annotation (e.g. A.1.1.1, A.1.1.2, ...). Examples: example mapping table with corresponding tree.

Examples

Example input files:

pplacer/guppy output 1: pplacer_res_1.sing.tre
pplacer/guppy output 2: pplacer_res_2.sing.tre
mapping table: mapping.tsv

Example command lines:

% cladinator -c=0.7 -m=mapping.tsv -S='(\d+)([a-z?]*)_.+' pplacer_res_1.sing.tre cladinator_out_1.tsv

% cladinator -c=0.7 -m=mapping.tsv -S='(\d+)([a-z?]*)_.+' pplacer_res_2.sing.tre cladinator_out_2.tsv

More Examples

% cladinator pp_out_tree.sing.tre result.tsv

% cladinator -c=0.5 -s=. pp_out_tree.sing.tre result.tsv

% cladinator -c=0.9 -s=_ -m=map.tsv pp_out_trees.sing.tre result.tsv

% cladinator -x -xs=& -xk pp_out_trees.sing.tre result.tsv

% cladinator -x -xs="|" pp_out_trees.sing.tre result.tsv

% cladinator -x -xk -m=map.tsv pp_out_trees.sing.tre result.tsv

% cladinator -m=map.tsv -S='(\d+)([a-z?]*)_.+' pp_out_trees.sing.tre result.tsv

Output Example

Matching Clade(s):

A: 0.83

B.1: 0.15

?: 0.01

C: 0.01

Specific-hit(s):

A.1.1: 0.53

Matching Clade(s) with Specific-hit(s):

A: 0.83

A.1.1: 0.53

B.1: 0.15

?: 0.01

C: 0.01

Matching Down-tree Bracketing Clade(s):

A: 0.83

B.1.1: 0.15

?: 0.01

C: 0.01

Matching Up-tree Bracketing Clade(s):

A: 0.83

B.1: 0.15

C: 0.02

last updated: 2017-11-15