RIO

RIO: Resampled Inference of Orthologs

Purpose

RIO (Resampled Inference of Orthologs) is a method for automated phylogenomics based on explicit phylogenetic inference. RIO analyses are performed over resampled phylogenetic trees to estimate the reliability of orthology assignments.

Download

Usage

% java -Xmx2048m -cp forester.jar org.forester.application.rio [options] <gene trees> <species tree> <outfile> [logfile]

Options

  • -f=<first> : first gene tree to analyze (0-based index) (default: analyze all gene trees)

  • -l=<last> : last gene tree to analyze (0-based index) (default: analyze all gene trees)

  • -r=<re-rooting> : re-rooting method for gene trees, possible values or 'none', 'midpoint', or 'outgroup' (default: by minizming duplications)

  • -o=<outgroup> : for rooting by outgroup, name of outgroup (external gene tree node)

  • -b : to use SDIR instead of GSDIR (faster, but non-binary species trees are disallowed, as are all options)

Gene trees

The gene trees ideally are in phyloXML format, with taxonomy and sequence data in appropriate fields; but can also be in New Hamphshire (Newick) or Nexus format, as long as species information can be extracted from the gene names (e.g. "HUMAN" from "BCL2_HUMAN"). All gene trees must be completely binary (example).

Species tree

The species tree ideally is in phyloXML format, but can also be in New Hamphshire (Newick) or Nexus format. The species tree is allowed to have nodes with more than two descendants (polytomies), as long as the (slower) GSDIR (GSDI re-rooting) algorithm is used (example).

Note about memory

Since the Java memory default allocation is too small for even moderately large data-sets, it is necessary to increase it with the -Xmx2048m command line option.

Examples

% rio gene_trees.nh species.xml outtable.tsv log.txt

% rio gene_trees.nh species.xml outtable.tsv log.txt -r=outgroup -o=XVL1_ECOLI

% rio gene_trees.nh species.xml outtable.tsv log.txt -f=0 -l=49

Example files

References

Last updated: 2017-12-06