Post date: Nov 21, 2015 5:53:24 AM
I have started (re)running analyses related to the inversion. First, I asked whether differences in recombination between the male and female for the big family were greater for pairs of scaffolds crossing vs. not-crossing the inversion. I defined the inversion based on the SV data (bounds = 58/59 to 66/67 where numbers denote scaffold order on LG8; this is the same as before), the region of truly elevated Fst in FHA (bounds = 19/22 to 64/65; entire region bounded by Fst > 0.2, which is arbitrary but captures the obvious graphical signal) and the elevated Fst region in N1 (bounds 19/22 to 64/65; identical to FHA so really this is just one test). Note that for Fst I used R.GIS (we could repeat this with GxS but that is not the contrast for the mapping family, so this would be hard to interpret/justify). Here are the results (the statistic of interest for the permutation test is the mean difference in male vs. female recombination rates for scaffolds spanning vs. within the inversion):
SV, Obs. = 0.0396, Null (mean) = -9.7e-5, Null (95th) = 0.0135, p < 0.001
Fst FHA or N1, Obs. = 0.00988, Null (mean) = 0.000142, Null (95th) = 0.00715, p = 0.017
I then asked whether Fst was elevated for LG8 within the inversion defined by SV. I did this for R.GIS for FHA and N1. Here are the results from the permutation tests (the statistic here is mean Fst for the inversion based on SV):
FHA, Obs. = 0.273, Null (mean) = 0.0579, Null (95th) = 0.148, p < 0.001
N1, Obs. = 0.303, Null (mean) = 0.0825, Null (95th) = 0.175, p < 0.001
Clearly, these are strong and consistent results.
Doro has already made some comparisons of Fst and LD, but I don't think these involve the SV-defined inversion (I can do this next if we want it). Other than that, is there another contrast we wanted that I am forgetting (I feel like there is, but this already seems quite solid).
Here is the R code (from calclgrecomb.R in /labs/evolution/projects/timema_speciation/timemaRecomb; some commenting/uncomment is required for different bounds and Fst values, and the code below excludes stuff I used for a few earlier plots):
## read data = recombinationr rates and quality
nl0<-3462475
recombP0FA<-matrix(scan("lgs_recombOutP0FA.txt",n=8*nl0,sep=" "),nrow=nl0,ncol=8,byrow=T)
nl1<-4482398
recombP1FA<-matrix(scan("lgs_recombOutP1FA.txt",n=8*nl1,sep=" "),nrow=nl1,ncol=8,byrow=T)
#load("lg.rwsp")
## identify and select the high quality subset of the data, Mendealian two-tailed prob. of 0.05, and 50 indviduals
hqr<-which(recombP0FA[,7] > 0.05 & recombP0FA[,6] >= 50)
hqdata0<-recombP0FA[hqr,]
hqr<-which(recombP1FA[,7] > 0.05 & recombP1FA[,6] >= 50)
hqdata1<-recombP1FA[hqr,]
## orientate recombination fractions by folding
hqdata0[hqdata0[,8] > 0.5,8]<-1-hqdata0[hqdata0[,8] > 0.5,8]
hqdata1[hqdata1[,8] > 0.5,8]<-1-hqdata1[hqdata1[,8] > 0.5,8]
## lg
lg0<-(which(hqdata0[,1]==8))
lg1<-(which(hqdata1[,1]==8))
## mean recomb. for scaffolds
slist<-unique(rbind(hqdata0[lg0,c(2,4)],hqdata1[lg1,c(2,4)]))
l<-dim(slist)[1]
dmat0<-rep(0,l)
dmatmin0<-rep(0.5,l)
nmat0<-rep(0,l)
dmat1<-rep(0,l)
dmatmin1<-rep(0.5,l)
nmat1<-rep(0,l)
N<-dim(hqdata0[lg0,])[1]
for(n in 1:N){
cat(n,"\n")
a<-which(slist[,1]==hqdata0[lg0[n],2] & slist[,2]==hqdata0[lg0[n],4])
nmat0[a]<-nmat0[a] + 1
dmat0[a]<-dmat0[a] + hqdata0[lg0[n],8]
if (dmatmin0[a] > hqdata0[lg0[n],8]){
dmatmin0[a] <- hqdata0[lg0[n],8]
}
}
dmatf0<-dmat0/nmat0
N<-dim(hqdata1[lg1,])[1]
for(n in 1:N){
cat(n,"\n")
a<-which(slist[,1]==hqdata1[lg1[n],2] & slist[,2]==hqdata1[lg1[n],4])
nmat1[a]<-nmat1[a] + 1
dmat1[a]<-dmat1[a] + hqdata1[lg1[n],8]
if (dmatmin1[a] > hqdata1[lg1[n],8]){
dmatmin1[a] <- hqdata1[lg1[n],8]
}
}
dmatf1<-dmat1/nmat1
## increased difference across inversion
bnd1<-c(58,59) # sv
#bnd1<-c(19,22) # fha or n1 fst
bnd2<-c(66,67) # sv
#bnd2<-c(64,65) # fha or n1 fst
fst<-read.table("HudsonFst_s090p001_binnedSNPs_FHA_N1_morphshosts.txt",header=TRUE)
flg<-which(fst$lg==8)
fscaf<-tapply(X=fst$N1.R.GIS[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)
#fscaf<-tapply(X=fst$FHA.R.GIS[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)
scafs<-tapply(X=fst$scaf[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)
ords<-as.numeric(names(fscaf))
#fst<-read.table("HudsonFst_allSNPsLG8scaf_greens.txt",header=TRUE)
invcr<-matrix(NA,nrow=l,ncol=2)
for(n in 1:dim(slist)[1]){
a<-which(scafs==slist[n,1])
if(length(a)==1){invcr[n,1]<-ords[a]}
a<-which(scafs==slist[n,2])
if(length(a)==1){invcr[n,2]<-ords[a]}
}
invcrbool<-rep(NA,l)
cr<-c(which(invcr[,1] < bnd1[1] & invcr[,2] > bnd1[2] & invcr[,2] < bnd2[1]),which(invcr[,1] > bnd1[2] & invcr[,1] < bnd2[1] & invcr[,2] > bnd2[2]))
wi<-which(invcr[,2] < bnd1[1] | invcr[,1] > bnd2[2] | (invcr[,1] > bnd1[2] & invcr[,2] < bnd2[1]))
dif<-abs(dmatf0-dmatf1)
ts<-mean(dif[cr],na.rm=T)-mean(dif[wi],na.rm=T)
null<-rep(NA,1000)
for(i in 1:1000){
L<-c(cr,wi)
x<-sample(1:length(L),length(cr),replace=F)
null[i]<-mean(dif[L[x]],na.rm=T)-mean(dif[L[-x]],na.rm=T)
}
pdf("recombInvSv.pdf",width=5,height=5)
hist(null,xlim=c(-0.03,0.035),xlab="mean dif. in male vs. female R spanning vs. within inv.")
abline(v=ts)
dev.off()
### fst by sv inversion
inv<-which(ords >= bnd1[2] & ords <= bnd2[1])
ts<-mean(fscaf[inv])
null<-rep(NA,1000)
for(i in 1:1000){
inv<-sample(1:length(ords),length(inv),replace=FALSE)
null[i]<-mean(fscaf[inv])
}