PASTA (Practical Alignment using SATé and TrAnsitivity) is an improvement to SATé: it uses some of the algorithmic design of SATé but is faster, produces more accurate alignments and trees, and can scale to much larger datasets. PASTA computes alignments on very large datasets using a divide-and-conquer technique, as follows. It divides the dataset into smaller and evolutionary less diverged subsets, gets alignments on those subsets, merges some pairs of these subset alignments to get a set of overlapping and compatible alignments, and finally uses transitivity to merge all these overlapping alignments and produce a final alignment. The novel transitivity-based merge technique allows PASTA to be very scalable, but also improves its accuracy compared to SATé, its predecessor technique.
If you have any trouble with this, please go to the PASTA tutorial, at https://github.com/smirarab/pasta/blob/master/pasta-doc/pasta-tutorial.md.
This google drive above includes the following files:
Contact: All questions and inquires should be addressed to our user email group: pasta-users@googlegroups.com
Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.
UPP (Ultra-large alignments using Phylogeny-aware Profiles) is a new method for the alignment of large and potentially fragmentary datasets. UPP takes as input a set of unaligned sequences and partitions the sequences into a "backbone set" (up to 1,000 sequences) and a "query set". PASTA is used to produce an alignment and tree on the backbone set, and these are then called the "backbone alignment" and "backbone tree". The sequences in the query set are then added to the backbone alignment set using the Ensemble of HMMs technique presented in the paper describing UPP.
The UPP paper uses all the datasets from PASTA shown above. In addition, below, we provide:
All questions should be addressed to Nam-phuong Nguyen (namphuon@illinois.edu), Siavash Mirarab (smirarab@gmail.com), or Tandy Warnow (warnow@illinois.edu).