Post date: Mar 20, 2015 6:31:50 PM
Details on individual genome assemblies can be found here. This includes a brief description of the multiple genome alignment with Mugsy.
The multiple alignment from Mugsy is /labs/evolution/data/lycaeides/whole_genomes/mugsyGenomeAlgns/mugsy/mugsyLycaeides.maf. I wrote a script to compute consensus sequences from the maf file. This script, computeConsensus.pl, generates a fasta file with the consensus sequence for each alignment. Cases where only one species had a sequence (uXXXX) are included. The consensus sequences are in mugsyLycaeidesConsensus.fasta. Some of these sequences are very short. I wrote a script to filter these by length (not counting Ns). The script is filterConsensus.pl. I ran this with minimum lengths of 1000 and 10,000 resulting in the files filtered_mugsyLycaeidesConsensus.fasta and filtered10k_mugsyLycaeidesConsensus.fasta, respectively. My current plan is to use the 5/10k set for the local ancestry analyses.
Here is a summary of the number of bases and aligned regions in each file:
Note, that this means most of the genome is in small aligned chunks.