As a first step in this analysis, we will generate the WGA using the Progressive Cactus aligner. Cactus is a highly accurate reference-free multiple genome alignment program1. Progressive Cactus enables the alignment of tens to thousands of large genomes without the use of a reference, while maintaining high alignment quality and scalability. Cactus solves large alignment problems by breaking them down into smaller subproblems using an input guide tree. Each subproblem involves comparing a set of ingroup genomes (the children of the internal node to be reconstructed) against each other, as well as a sample of outgroup genomes (non-descendants of the internal node in question).
Dataset: For my study, I utilized a total of 63 genomes. Among them, 44 genomes belong to Heliconius species, 14 genomes belong to other Heliconiini species, and 5 genomes are used as outgroup references. Due to time constraints, I selected 7 species from the complete dataset: H. melpomene, H. doris, H. erato, H. demophoon, and H. charitonia as ingroup genomes; Eueides isabella, Dryadula phaetusa, and Speyeria mormonia as outgroup genomes. Furthermore, I narrowed down the dataset to a single chromosome. The ancestral cromosome is the 21 which in Heliconius species was renamed chromosome 2.
The documentation for Progressive Cactus can be found on the GitHub repository. In this workshop, we will align the subset of species by preparing the configuration file, which is a text file containing the locations of the input sequences and their phylogenetic tree. As we explained before, the tree will be used to progressively decompose the alignment by iteratively aligning sibling genomes to estimate their parent genomes in a bottom-up fashion. Cactus utilizes the predicted branch lengths from the tree to determine appropriate pairwise alignment parameters, enabling quicker alignment for closely related species without sacrificing accuracy. The file should be formatted as follows:
NEWICK tree
name1 path1
name2 path2
...
nameN pathN
Solution >
source $HOME/software/.source/cactus-bin-v2.5.1/cactus_env/bin/activate
--
cactus jobStore Cactus.Chr2.config Chr2.hal --workDir . --maxMemory 20G --binariesMode local --stats --logFile Cactus.log --maxCores 2 --defaultMemory 5G
--