ASTRAL-related papers
weighted ASTRAL:
Zhang, C., & Mirarab, S. (2022). Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Molecular Biology and Evolution, 39(12), msac215. https://doi.org/10.1093/molbev/msac215.
Refer to this page all the data: https://github.com/chaoszhang/Weighted-ASTRAL_data
ASTRAL-Pro:
Zhang, C., Scornavacca, C., Molloy, E. K., & Mirarab, S. (2020). ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Molecular Biology and Evolution, 37(11), 3292–3307. https://doi.org/10.1093/molbev/msaa139
Please refer to this GitHub: https://github.com/chaoszhang/A-pro_data/
Zhang, C., & Mirarab, S. (2022). ASTRAL-Pro 2: Ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac620
Please refer to this GitHub: https://github.com/chaoszhang/A-Pro2_data
ASTRAL-MP:
Yin, J., Zhang, C., & Mirarab, S. (2019). ASTRAL-MP: Scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics, 35(20). https://doi.org/10.1093/bioinformatics/btz211.
We have used the following datasets
SV 10-1000 : These gene trees can be found on the ASTRAL-II subsection below.
Avian: This is the dataset with 48 species and 14446 gene trees with varying contractions. The gene trees, the species trees generated by ASTRAL as well as the ASTRAL output can be found here https://datadryad.org/stash/dataset/doi:10.6076/D16W2H
Insects: This is the dataset with 144 species and 149278 gene trees. The gene trees can be found at https://doi.org/10.6076/D14599
Simulated 10K: This is the dataset with 10K species and simulated using SimPhy (20 replicates).
All the data can be found here: https://datadryad.org/stash/dataset/doi:10.6076/D16W2H
Scripts to generate the data and to analyze it are also found here: https://github.com/smirarab/astralmp-simulations
ASTRAL-Multiind:
Rabiee, Maryam, Erfan Sayyari, and Siavash Mirarab. “Multi-Allele Species Reconstruction Using ASTRAL.” Molecular Phylogenetics and Evolution 130 (2019): 286–96. https://doi.org/10.1016/j.ympev.2018.10.033.
Refer to this gitlab for all the data: https://gitlab.com/mrabiee/ASTRAL-multiind/
INSTRAL:
Rabiee, Maryam, and Siavash Mirarab. “INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores.” Systematic Biology, July 10, 2019. https://doi.org/10.1093/sysbio/syz045.
Data are available on Dryad: https://doi.org/10.5061/dryad.cs59t13
Polytomy test:
Sayyari, E., & Mirarab, S. (2018). Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies. Genes, 9(3), 132. https://doi.org/10.3390/genes9030132
ASTRAL-III:
Zhang, Chao, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. “ASTRAL-III: Polynomial Time Species Tree Reconstruction from Partially Resolved Gene Trees.” BMC Bioinformatics 19, no. S6 (2018): 153. https://doi.org/10.1186/s12859-018-2129-y.
For data, please refer to https://gitlab.com/esayyari/ASTRALIII
LocalPP:
Sayyari, E., & Mirarab, S. (2016). Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies. Molecular Biology and Evolution, 33(7), 1654–1668. https://doi.org/10.1093/molbev/msw079
ASTRAL-HGT study::
Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy (2023): Data from: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer.
https://databank.illinois.edu/datasets/IDB-6670066
ASTRAL-II:
Mirarab, S., & Warnow, T. (2015). ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44–i52. https://doi.org/10.1093/bioinformatics/btv234
Download simulation scripts from Github: https://github.com/smirarab/astral2sims
All the simulation and real data files are given in the Dryad: https://doi.org/doi:10.6076/D10C7C.
A full README.md on Dryad explains the content of the data files on Dryad
ASTRAL-I:
Mirarab, Siavash, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, M. S. Swenson, and Tandy Warnow. “ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation.” Bioinformatics 30, no. 17 (2014): i541–48. doi:10.1093/bioinformatics/btu462.
Datasets
The following datasets are used in the ASTRAL paper shown above. All these archive files include README files that describe their content. We acknowledge the help of Bastien Boussau who performed these simulations for another study and made them available to us for this paper.
biological.zip: This file includes 1) our estimated gene trees on alignments provided to us by authors of Song et al, 2012, PNAS, 2) our estimated species trees on the same dataset.
truetrees.zip: The model species tree and the true gene trees simulated based on the mammalian dataset of Song et al, 2012, PNAS.
sequencedata.zip: Sequence data simulated on the true gene trees (mammalian dataset)
estimatedgenetrees.zip: gene trees estimated using RAxML on alignments of length 1000 and 500 (mammalian dataset).