Post date: Dec 02, 2019 3:27:22 AM
As part of our revisions on the deletion manuscript, I wanted to verify the absence of the color locus (4.9 to 6.4 mbp on the melanic scaffold 128) in the green (chicago) and stripe (hi-c) genomes.
I was working in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/MugsyComp/.
I basically wanted to extract everything aligning to 128, and see if any of it overlapped with the ~5-6 mbp region.
For melanic vs green I used:
mualnTcrBrwGrnDec16.maf
And for melanic vs striped I used (this is only has scaffolds 702.1 and 128, but that is fine):
mualnTcrBrwStrJun19lg8v2.maf
First, I extracted the coordinates,
perl findIndel.pl > indelOverlap.txt
grep "green" indelOverlap.txt > indelOverlapReal.txt
perl findIndelStr.pl > indelOverlapStr.txt
grep "timema" indelOverlapStr.txt > indelOverlapStrReal.txt
These are similar, the first is for green. The grep commands get rid of matches to unaligned stuff.
Here is what findIndel.pl looks like:
#!/usr/bin/perl
#
# script searches for scaffold 128, use 5 to 6 mbp, note really 4.9 to 6.4
#
#
open(IN, "mualnTcrBrwGrnDec16.maf") or die "failed to read\n";
$ilb = 5000000;
$iub = 6000000;
while(<IN>){
chomp;
if(m/^s\s+timema_06Jun2016_RvNkF702.ScRvNkF_128\s+(\d+)\s+(\d+)/){
$lb = $1;
$ub = $1+$2;
$green = <IN>;
$green =~ m/^s\s+(\S+)/;
$greenScaf = $1;
$overlap = 0;
if(($lb >= $ilb & $lb <= $iub) or ($ub >= $ilb & $ub <= $iub) or ($lb <= $ilb & $ub >= $iub)){
$overlap = 1;
}
print "$lb $ub $overlap $greenScaf\n";
}
}
Then, in R I plotted where genomes aligned. This is done in plotIndel.R (see below) and generates indel.pdf (referred to below). Note that I only consider alignments of 10kb or more.
dat<-read.table("indelOverlapReal.txt")
sz<-dat[,2]-dat[,1]
dat<-dat[sz>10000,]
L<-dim(dat)[1]
lb<-min(dat[,1])
ub<-max(dat[,2])
library(scales)
pdf("indel.pdf",width=6,height=5)
par(mfrow=c(2,1))
plot(c(lb,ub),c(0,1),type='n',xlab="Position scaf. 128 (bp)",ylab="Alignment",axes=FALSE)
plb<-4900000
pub<-6400000
polygon(c(plb,pub,pub,plb),c(0,0,1,1),col=alpha("blue",.4),border=NA)
axis(1)
box()
for(i in 1:L){
lines(c(dat[i,1],dat[i,2]),c(0.5,0.5),lwd=5,col=alpha("black",.5))
}
dat<-read.table("indelOverlapStrReal.txt")
sz<-dat[,2]-dat[,1]
dat<-dat[sz>10000,]
plot(c(lb,ub),c(0,1),type='n',xlab="Position scaf. 128 (bp)",ylab="Alignment",axes=FALSE)
plb<-4900000
pub<-6400000
polygon(c(plb,pub,pub,plb),c(0,0,1,1),col=alpha("blue",.4),border=NA)
axis(1)
box()
for(i in 1:L){
lines(c(dat[i,1],dat[i,2]),c(0.5,0.5),lwd=5,col=alpha("black",.5))
}
dev.off()
So, what does it mean.
First, for green vs. melanic what we thought is basically true. There is one alignment that creeps into the side of the indel, but not much really. See panel (A) in the indel.pdf. Here, the indel is shown in blue, and black lines show all of the places green aligned to scaffold 128 (these are short so look almost like overlapping dots). Nice and clean.
Ok, now for the more complicated result. I looked back at a figure I sent before describing the melanic vs. stripe alignment (brownStripeColorAln.png) and my related e-mail comments.
My earlier description:
"- Things mostly make sense with scaffolds in the predicted order, etc.
- As I mentioned yesterday, note the 702.1 aligns, then 128, then back to 702.1 then back to 128. This strongly supports an inversion, with one odd caveat that I can't quite make sense of. The two chunks of 128 and the two chunks of 702.1 are in the same orientation (its like the pieces swapped without flipping orientation... maybe this is because of the double inversion, but I want to see the 3-way alignment first.
- There is also evidence of a smaller (1 mb) inversion within 702.1 relative to the melanic.
- There are some chunks in the stripe but not in the melanic (or at least not part of this alignment)."
Ok, so the bit where 128 gets split is right at the right edge of the deletion (the far/right end of it). What I somehow had missed (at least recently) is that the deletion does not appear to be deleted. This shows up clearly in panel B of my new figure.
So, the color locus is deleted in green (not translocated), but it is present in stripe (and not translocated, the counts of where stuff aligned I sent earlier are correct). Really not sure how I missed this before now. Instead, damn near at the exact boundary of the deletion (in the green genome), the stripe genome flips chunks of 128 and 702.1 around (an inversion). This actually makes me think stripe evolved from melanic, not green and did so by an inversion (where the inversion perhaps still is the causal mutation, not just locking stuff up but somehow screwing up expression or something).
One more note, just from eyeballing stuff, the viewer associated with the comparative alignment spits out a % identity for each aligned region. This is actually higher (haven't formally quantified, but visually looks so) for the deletion region between melanic and stripe.
And this all makes me think that we just want to leave stripe out of the current story.