Post date: Nov 16, 2017 3:47:57 PM
Timema chumash does not appear to have the inversion(s) or indel in the color pattern region. And, consistent with this and unlike the other species, individual PIPs can be large. This led to a question, has selection increased LD among these top PIP SNPs relative to other SNPs at similar physical distances. If so, this would suggest the inversion would have been favored had it arose.
Romain has results from mapping only in the indel region (which is actually no an indel in chumash). I copied these files to king:/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/Tchumash_GR8.06_gemma_scaf128_5-6Mbp_v1.3b
I have been working with dorsal color in terms of GvB and RvG.
Here is the R code (from chumashLd.R, see below). Note that I defined top PIPs as those greater than {0.2, 0.3, 0.4, 0.5}. This corresponds to 55, 12, 6, and 6 of the 126 SNPs in the region. I used the squared Pearson correlation as the measure of LD (as in Vos et al., 2017) and in addition to just summarizing patterns, fit non-linear regressions for E[r2] = 1/(1+ b x), where x is the physical distance (really should be genetic distance, but...) and b is the regression coefficient (see Gaut and Long, 2003). I did this for all SNPs and top SNPs. You actually get a higher rate of decay for the top 0.2 PIP SNPs (0.496 vs. 0.4179 for all), but lower for top 0.3 PIP SNPs (0.3084). The rate is even lower for the top 0.4 or 0.5 SNPs (0.001391) but I don't think this is meaningful as none of these have small distances (so you are really fitting the function on a very different part of the x-axis). And just from the plots of LD vs distance, it is quite clear that the top 0.4 and 0.5 PIP SNPs are just not interesting. The LD plots are here, with orange dots denoting comparisons between pairs of top PIP SNPs.
gen<-read.table("dorGB/Tchumash_GR8.06.dorGB.bial.noindel.qs20.cov50.mdp265Mdp5310.maf0.01.scaf128_5-6Mbp.geno")
mapGB<-read.table("/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/Tchumash_GR8.06_gemma_scaf128_5-6Mbp_v1.3b2/dorGB/output/Tchumash_GR8.06.dorGB.bial.noindel.qs20.cov50.mdp265Mdp5310.maf0.01.scaf128_5-6Mbp_sparse_effects_all_SNPs_except_MAFcutoff_formatted.txt",header=TRUE)
mapRG<-read.table("/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/Tchumash_GR8.06_gemma_scaf128_5-6Mbp_v1.3b2/dorRG/output/Tchumash_GR8.06.dorRG.bial.noindel.qs20.cov50.mdp265Mdp5310.maf0.01.scaf128_5-6Mbp_sparse_effects_all_SNPs_except_MAFcutoff_formatted.txt",header=TRUE)
G<-as.matrix(gen[,-c(1:3)])
r2Gb<-matrix(NA,nrow=126,ncol=126)
for(i in 1:126){for(j in 1:126){
r2Gb[i,j]<-cor(G[i,],G[j,])^2
}}
distGb<-matrix(NA,nrow=126,ncol=126)
for(i in 1:126){for(j in 1:126){
distGb[i,j]<-abs(mapGB$PS[i]-mapGB$PS[j])
}}
par(mfrow=c(2,2))
for(i in seq(0.2,0.5,0.1)){
plot(distGb[lower.tri(distGb)],r2Gb[lower.tri(r2Gb)],col="darkgray",pch=20)
hi<-which(mapGB$gamma > i | mapRG$gamma > i)
x<-hi_distGb[lower.tri(hi_distGb)]
y<-hi_r2Gb[lower.tri(hi_r2Gb)]
points(hi_distGb[lower.tri(hi_distGb)],hi_r2Gb[lower.tri(hi_r2Gb)],col="orange",pch=19)
}
## nl regression
y<-r2Gb[lower.tri(r2Gb)]
x<-distGb[lower.tri(distGb)]
para0.st<-c(beta=0.0001)
dat<-data.frame(x,y)
fit0<-nls(y ~ 1/(1+beta * x),dat,start=para0.st,trace=TRUE)
fit0
#Nonlinear regression model
# model: y ~ 1/(1 + beta * x)
# data: dat
# beta
#0.4179
# residual sum-of-squares: 61.6
## Very rapid decay of LD, basically gone by 200 bp
nls.fit<-vector("list",4)
j<-1
for(i in seq(0.2,0.5,0.1)){
hi<-which(mapGB$gamma > i | mapRG$gamma > i)
hi_distGb<-distGb[hi,hi]
hi_r2Gb<-r2Gb[hi,hi]
x<-hi_distGb[lower.tri(hi_distGb)]
y<-hi_r2Gb[lower.tri(hi_r2Gb)]
para0.st<-c(beta=0.0001)
dat<-data.frame(x,y)
nls.fit[[j]]<-nls(y ~ 1/(1+beta * x),dat,start=para0.st,trace=TRUE)
j<-j+1
}