Post date: Apr 15, 2019 2:58:12 PM
Timema poppensis is a redwood feeder and appears (from population genetic analyses) to harbor the putative LG 11 inversion at high frequency (or maybe it is fixed for it). As an initial pass at looking for evidence of large-scale structural variation between T. cristinae and T. poppensis, I aligned our T. cristinae genome (timema_06Jun2016_RvNkF702) to a highly fragmented draft T. poppensis genome from Tanja.
The relevant files are in:
/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_confiers/popp_genome
1. My initial attempt at a comparative alignment failed (timed-out) at about a month. This was likely because of the large number of small scaffolds in the T. poppensis assembly. Thus, I extracted the subset of scaffolds that were at least 100,000 bps long. These are in sub_100000_Tpopp_1_Tps_b3v08.fasta.
2. I used mugsy to align the T. cristinae and T. poppensis genomes.
I ran, mugsyTcrVsTpopp.sh
which does this:
bash
source /uufs/chpc.utah.edu/common/home/u6000989/source/mugsy_x86-64-v1r2.3/mugsyenv.sh
cd /scratch/general/lustre/gompMugsy/
/uufs/chpc.utah.edu/common/home/u6000989/source/mugsy_x86-64-v1r2.3/mugsy --directory /scratch/general/lustre/gompMugsy --prefix mualnTcrToppMar19 /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/timema_06Jun2016_RvNkF702.fasta /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_confiers/popp_genome/sub_100000_Tpopp_1_Tps_b3v08.fasta
mv ./* /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_confiers/popp_genome/
3. I spent some time going over the comparative alignment. I first focused on all T. poppensis scaffolds that aligned to T. cristinae LG 11, with the main idea of looking for any obvious inversion breakpoints. I then asked whether any T. poppensis scaffolds that aligned to LG 11 also aligned to another LG (which might suggest translocations, or just issues with our LGs or the poppensis assembly). I did this with the getLg11maf.pl and getPopLg11maf.pl scripts respectively, and then viewed the alignments with,
java -Xmx14G -jar ~/source/gmaj.jar lg11_mualnTcrToppMar19.maf
and
java -Xmx14G -jar ~/source/gmaj.jar popp_lg11_mualnTcrToppMar19.maf
Here are the main things I found.
a. We only have one case where we have evidence of a non-trivial inversion between a T. cristinae LG 11 scaffold and a T. poppensis scaffold (T. cristinae 640 vs. T. poppensis 1357; the alignment direction switches and does so at a position where the T. poppensis genome has a ~75,000 bp gap relative to the T. crsitaine genome). The whole alignment is about 200,000 bps. We can't say what happens with the orientation beyond this, and thus we don't know how big the inversion is. The T. cristinae scaffold is around the middle of LG 11, and thus within the Fst peak (but so is almost any other scaffold on LG 11).
b. Of the T. poppensis scaffolds that align to T. cristinae LG 11 scaffolds, most only align to LG 11 scaffolds. A few also align to scaffolds lacking a LG (which isn't surprising). The only other LG to which LG 11 aligning scaffolds align is LG 4 (alignments are to T. cristinae scaffolds 36 and 641, both on LG 4). We have some other evidence of LG 4 being perhaps associated with LG 11 (i.e., the Fst signal I think extended some to LG 4, and specifically to scaffold 641, which is one of our scaffolds of interest). This could mean that a few of our LG 4 scaffolds belong on LG 11, or that some chunks of LG 4 0000in T. cristinae have actually moved to LG 11 in T. poppensis. This is interesting, but we really need to the dovetail genomes I think to nail it down. Also, the fact that so few T. poppensis scaffolds that align to LG 11 align to other LGs makes me feel pretty good about or LGs.