RIO
RIO: Resampled Inference of Orthologs
Purpose
RIO (Resampled Inference of Orthologs) is a method for automated phylogenomics based on explicit phylogenetic inference. RIO analyses are performed over resampled phylogenetic trees to estimate the reliability of orthology assignments.
Download
Most current version (might be unstable): forester.jar
Source code is available at GitHub: https://github.com/cmzmasek/forester
Usage
% java -Xmx2048m -cp forester.jar org.forester.application.rio [options] <gene trees> <species tree> <outfile> [logfile]
Options
-f=<first> : first gene tree to analyze (0-based index) (default: analyze all gene trees)
-l=<last> : last gene tree to analyze (0-based index) (default: analyze all gene trees)
-r=<re-rooting> : re-rooting method for gene trees, possible values or 'none', 'midpoint', or 'outgroup' (default: by minizming duplications)
-o=<outgroup> : for rooting by outgroup, name of outgroup (external gene tree node)
-b : to use SDIR instead of GSDIR (faster, but non-binary species trees are disallowed, as are all options)
Gene trees
The gene trees ideally are in phyloXML format, with taxonomy and sequence data in appropriate fields; but can also be in New Hamphshire (Newick) or Nexus format, as long as species information can be extracted from the gene names (e.g. "HUMAN" from "BCL2_HUMAN"). All gene trees must be completely binary (example).
Species tree
The species tree ideally is in phyloXML format, but can also be in New Hamphshire (Newick) or Nexus format. The species tree is allowed to have nodes with more than two descendants (polytomies), as long as the (slower) GSDIR (GSDI re-rooting) algorithm is used (example).
Note about memory
Since the Java memory default allocation is too small for even moderately large data-sets, it is necessary to increase it with the -Xmx2048m command line option.
Examples
% rio gene_trees.nh species.xml outtable.tsv log.txt
% rio gene_trees.nh species.xml outtable.tsv log.txt -r=outgroup -o=XVL1_ECOLI
% rio gene_trees.nh species.xml outtable.tsv log.txt -f=0 -l=49
Example files
References
Zmasek CM and Eddy SR "RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs" BMC Bioinformatics 2002, 3:14
Zmasek CM and Eddy SR "A simple algorithm to infer gene duplication and speciation events on a gene tree" Bioinformatics, 17, 821-828
Han M and Zmasek CM "phyloXML: XML for evolutionary biology and comparative genomics" BMC Bioinformatics 2009, 10:356
Last updated: 2017-12-06