SuperFine, DACTAL, and BeeTLe

Please direct correspondence regarding SuperFine and DACTAL to Tandy Warnow.

SuperFine

The original paper appeared in Systematic Biology (2012) 61(2):214-227, doi:10.1093/sysbio/syr092 (first published online September 20, 2011). (PDF).
Nguyen, N., S. Mirarab, and T. Warnow. "MRL and SuperFine+MRL: new supertree methods." Journal Algorithms for Molecular Biology 7:3, 2012. (PDF).
Neves, D. T., T. Warnow, J. L. Sobral and K. Pingali. "Parallelizing SuperFine." 27th Symposium on Applied Computing (ACM-SAC), Bioinformatics, 2012, pages 1361--1367, doi = 10.1145/2231936.2231992. (PDF).

SuperFine is software for constructing supertrees from source phylogenies.

Software

This section details steps for installing and running SuperFine. We ran SuperFine on a machine running Linux release 2.6.24.6+desktop10+25.63. If you experience difficulty installing or running the software, please contact us at the e-mail address below.

1. Install PAUP*.

SuperFine requires that you first have PAUP* installed and executable via a "paup -n" system command (i.e. the PAUP* binary must be in your Unix executable search path, settable via the PATH environment variable). We used PAUP* version 4.0b10 for Unix; other versions may or may not work with SuperFine.
For instructions on obtaining and installing PAUP*, please visit the PAUP* home page.

2. Installing python packages.

The software packages listed below are Python source distributions. To use them, you must first have Python installed on your system; for details on obtaining and installing Python, please visit the Python home page. We used Python version 2.6.

To uncompress and inflate each distribution file, run "tar -xzf <package>.tar.gz". To install each package, run "python setup.py install" from inside the uncompressed package directory; this step requires root access to the system.
If you do not have root access, invoke the setup script as follows: "python setup.py install --prefix=/some/path/on/your/system", where "/some/path/on/your/system" is the path to a directory on your system to which you do have read and write access. If you use the "--prefix" option, you must ensure that the "lib/python2.x/site-packages" subdirectory (where "x" denotes the minor version number of your Python install) of the directory you specify following "--prefix=" is on Python's search path. To add a directory to Python's search path, modify your PYTHONPATH environment variable.
More instructions on installing Python packages can be found on this Python page.

1. Install the newick_modified package.

2. Install the spruce package.: spruce-1.0.tar.gz

3. Install the dendropy package: DendroPy-v2.4.0.tar.gz (shown below)

4. Install the reup package: reup-1.0.tar.gz (shown below)

Running SuperFine: Copy the "runReup.py" script from the "bin" subdirectory of the location in which you installed the Python packages to the directory in which you are working. Run the script using the command "python runReup.py -r gmrp <source_trees_file>", where "<source_trees_file>" is the name of a file containing source trees in Newick format.

DataSets

Data

DACTAL

Nelesen, S., K. Liu, L.-S. Wang, C. R. Linder, and T. Warnow. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics Vol 28, ISMB 2012, pages i274-i282. (PDF). The DACTAL method uses SuperFine within a divide-and-conquer framework.

DACTAL (Divide-And-Conquer Trees without ALignments) is a software package for estimating phylogenies from ultra-large datasets with up to tens of thousands of unaligned nucleotide sequences and with many kb of sequence length.

Acquiring

An alpha-release of the DACTAL software package used in the submitted manuscript can be obtained below through the file dactal-ship.tar.v.1.1.bz2. After downloading, decompress the file using the following command:

tar xjf dactal-ship.tar.bz2

Then, read the instructions in ./dactal-ship/README to install DACTAL.

Trees from published analyses

Trees from analyses in the publication can be obtained below through the file dactal-trees.tar.bz2.

BeeTLe

K. Liu and T. Warnow, "Treelength Optimization for Phylogeny Estimation," PLoS ONE, vol. 7, no. 3:e33104, 2012, doi: 10.1371/journal.pone.0033104.

All files are in compressed .tar.bz2 format. To extract a compressed <file>, use the command: tar xjf <file>

Simulated nucleotide datasets

There are 30 100-taxon model conditions, with either long, medium, or short gap length types. Each model condition is referred to with the string "<number of taxa><gap length type: L for long, M for medium, S for short><id number>". Each model condition has 20 replicate datasets.

After extracting the compressed file sim_100.tar.bz2, the directory structure <model condition parameter string>/R<replicate number>/ will have the following files:

rose.tt - Potentially Inferrable Model Tree, or PIMT, (model tree with zero event edges collapsed) in Newick format
rose.mt - binary model tree in Newick format
rose.aln.true.fasta - true alignment in FASTA format
rose.mt.internal - binary model tree with all nodes named in Newick format
rose.aln.true.internal.fasta - true alignment for all nodes' sequences in FASTA format

Biological datasets

There are 4 biological datasets. For each dataset, an associated link to a tarball is listed below. The tarball contains a single directory <dataset name>/R0/ that contains the following files:

cleaned.alignment.fasta - cleaned curated alignment in FASTA format
cleaned.alignment.fasta.name_map - space-delimited two-column data file listing the bijective mapping between taxon names in the cleaned curated alignment file cleaned.alignment.fasta and the original curated alignment file from the Gutell lab CRW

rRNA datasets:

23S.M
23S.M.aa_ag
23S.E.aa_ag
23S.E

For reference purposes, the original Gutell lab CRW datasets can be accessed at the following links. Please note that these original datasets differ from the cleaned datasets above, which were the datasets actually used in the experiments. Furthermore, these original uncleaned curated alignments are not in FASTA format, and are instead in a more verbose GenBank format explained here.

Original uncleaned datasets:

23S.M
23S.M.aa_ag
23S.E.aa_ag
23S.E

Software

BeeTLe (version May 17, 2013) is given below under the directory beetle

Report abuse