Cactus can take more than an hour to align this dataset on your instance, and it is strongly dependent on the allocated CPUs and memory for the job. I have provided the alignment in case you want to skip this step. [HelicChr2.hal]
The output of Cactus is a HAL (Hierarchical Alignment Format) file. As the name suggests, the file is hierarchical and designed to store genome-scale multiple sequence alignments. It can represent a complete tree of genomes, including their sequence data, annotations, and evolutionary relationships. The HAL file is binary and optimized for random access, as well as supporting incremental updates to the alignment. If needed, you can use Haltools to manipulate and convert the alignment into other formats.
In this section, you can utilize halSummarizeMutations to extract various useful information, such as mutations (insertions, deletions, inversions, duplications, transpositions, gap insertions, and gap deletions) in each branch of the alignment. The job is a single CPU process and may require ~25 minutes to complete. You can try redirecting the output to a log file and running it in the background using the '&' symbol.
> What is the predominant mutation occurring at the branch of Heliconius?
> And how many bases does it involve?
Command solution
halSummarizeMutations HelicChr2.hal [optional --maxNFraction]
Solution >
Insertions, with over 1.6 Mb.
There's a hidden file named .HelicChr2.SummarizeMutations.csv
You can also compute the alignment coverage using halAlignmentDepth, which is the number of genomes aligned to a reference species. It will output data to the Stdout in wiggle format. You can redirect the output of a file and convert it into a binary and compressed file (BigWig format) with wigToBigWig that can be easily visualized on a genome browser such as IGV. [ it takes ~ 1 minute per coverage].
Use the tool to:
> Compute the overall coverage using H. melpomene as a reference
Solution >
halAlignmentDepth --noAncestors HelicChr2.hal Hmel > Hmel.Cov.wig
--
wigToBigWig Hmel.Cov.wig Hmel.Chr2.fasta.fai Hmel.Cov.bw
--
> Compute the coverage of only Heliconius species
Solution >
halAlignmentDepth --rootGenome Heliconius --noAncestors HelicChr2.hal Hmel > Hmel.HelOnly.Cov.wig
--
wigToBigWig Hmel.HelOnly.Cov.wig Hmel.Chr2.fasta.fai Hmel.HelOnly.Cov.bw
--
> Compute the coverage of only non-Heliconius species
Solution >
halAlignmentDepth --targetGenomes Eisa,Dpha,Smor --noAncestors HelicChr2.hal Hmel > Hmel.NonHelOnly.Cov.wig
--
wigToBigWig Hmel.NonHelOnly.Cov.wig Hmel.Chr2.fasta.fai Hmel.NonHelOnly.Cov.bw
--
> Can you spot any region of the chromosome that is more specific to Heliconius species?