This webpage provides datasets used in two papers:
J. Yang and T. Warnow (Fast and accurate methods for phylogenomic analyses, BMC Bioinformatics 2011)
M. S. Bayzid and T. Warnow (Naive binning can improve phylogenomic analysis, Bioinformatics, to appear).
Disck Covering Methods Improve Phylogenomic Analyses, Md. Shamsuzzoha Bayzid, Tyler Hunt and Tandy Warnow, BMC Genomics
Please cite the relevant papers if you use these datasets.
500 replicates, at 8 and 32 genes: 17-taxon.tar.bz2:
file contents:
species tree: 17-taxon/st/RepXstree - where X is the replicate number
gene tree: 17-taxon/Yloci/RepXgtrees - where X is the replicate number and Y is the number of gene trees
gene sequences: 17-taxon/Yloci/seq/RepXgseqs - where X is the replicate number and Y is the number of gene trees
Please cite
Yun Yu, Tandy Warnow, and Luay Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference, Proc. RECOMB 2011.
Yun Yu, Tandy Warnow, and Luay Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. J Comp Biol, 18(11): 1543-1559.
10 replicates, 25 genes: 100-taxon-ILS.tar.bz2
file contents:
species tree: model_tree - the species tree
gene tree: 100-taxon-ILS/seq/RepX.gtY.rose.tree.t - where X is the replicate number and Y is the gene
gene alignment: 100-taxon-ILS/seq/RepX.gtY.rose.true.aln - where X is the replicate number and Y is the gene
Please cite Jimmy Yang and Tandy Warnow, Fast and accurate methods for phylogenomic analyses, BMC Bioinf 2011 12 (Suppl 9:S4).
6 model conditions, 10 replicates, 25 and 50 genes
100L2-nonILS.tar.bz2
100L3-nonILS.tar.bz2
100S2-nonILS.tar.bz2
100L2-vbr1-nonILS.tar.bz2
100L2-vbr1-nonILS.tar.bz2
100S2-vbr1-nonILS.tar.bz2
file contents (replace 100L2 with another model condition for the corresponding file):
species tree: 100-taxon-nonILS/rose/100L2/rose.internal.model.100L2.reference_tree
gene tree: 100-taxon-nonILS/rose/100L2/X/rose.internal.model.100L2.Y.tree.t - where X is the replicate number and Y is the gene
gene alignment: 100-taxon-nonILS/rose/100L2/X/aln/rose.internal.model.100L2.Y.true_aln - where X is the replicate number and Y is the gene
Please cite Jimmy Yang and Tandy Warnow, Fast and accurate methods for phylogenomic analyses, BMC Bioinf 2011 12 (Suppl 9:S4).
6 model conditions, 10 replicates, 25 and 50 genes
500L5-nonILS.tar.bz2
500S3-nonILS.tar.bz2
500M3-nonILS.tar.bz2
500L5-vbr1-nonILS.tar.bz2
500S3-vbr1-nonILS.tar.bz2
500M3-vbr1-nonILS.tar.bz2
file contents (replace 500L5 with another model condition for the corresponding file):
species tree: 500-taxon-nonILS/rose/500L5/rose.internal.model.500L5.reference_tree
gene tree: 500-taxon-nonILS/rose/500L5/X/rose.internal.model.500L5.Y.tree.t - where X is the replicate number and Y is the gene
gene alignment: 500-taxon-nonILS/rose/500L5/X/aln/rose.internal.model.500L5.Y.true_aln - where X is the replicate number and Y is the gene
Please cite Jimmy Yang and Tandy Warnow, Fast and accurate methods for phylogenomic analyses, BMC Bioinf 2011 12 (Suppl 9:S4).
In addition to the 17-taxon datasets listed above, this paper also studied the following data:
11-taxon_Bioinformatics.zip
100 replicates, 100 genes
File contents:
species tree: 11-taxon_Bioinformatics/model_tree - the model species tree.
gene tree (strongILS): 11-taxon_Bioinformatics/simTree_11taxa_100genes_noHGT_strongILS/simtrees_X.nex - where X is the replicate number. Each file contains 100 true gene trees.
gene tree (weakILS): 11-taxon_Bioinformatics/simTree_11taxa_100genes_noHGT_weakILS/simtrees_X.nex - where X is the replicate number. Each file contains 100 true gene trees.
gene alignment (strongILS): 11-taxon_Bioinformatics/modi_simSeq_strongILS/modi_sequence_X.nex - where X is the replicate number. Each file contains 100 gene sequence alignments.
gene alignment (strongILS): 11-taxon_Bioinformatics/modi_simSeq_weakILS/modi_sequence_X.nex - where X is the replicate number. Each file contains 100 gene sequence alignments.
Estimated gene tree (strongILS): 11-taxon_Bioinformatics/estimated_genetrees_strongILS/RepX.raxml - where X is the replicate number. Each file contains 100 estimated gene trees.
Estimated gene tree (weakILS): 11-taxon_Bioinformatics/estimated_genetrees_weakILS/RepX.raxml - where X is the replicate number. Each file contains 100 estimated gene trees.
Please cite Y. Chung and C. Ane (2011). Comparing two Bayesian methods for gene tree/ species tree reconstruction: a simulation with incomplete lineage sorting and horizontal gene transfer. Syst Biol 60(3): 261-275.
17-taxon ILS (estimated gene trees):
17-taxon_Bioinformatics.zip
100 replicates, at 8 and 32 genes
File contents:
Estimated gene trees (8-gene): 17-taxon_Bioinformatics/estimated_genetrees_8-gene/RepX.raxml - where X is the replicate number.
Estimated gene trees (32-gene): 17-taxon_Bioinformatics/estimated_genetrees_32-gene/RepX.raxml - where X is the replicate number.