Post date: Nov 24, 2015 6:30:43 PM
I looked at the correlations for scaffold order between the two parents of the big family along LG8. My first thought was to just ask whether scaffold order was negatively correlated within the 'region of interest'. But, this isn't the case. The positive correlation is less than that for other parts of LG8, but its not negative. The problem is that the signal isn't strong or consistent enough for enough of the region. I then decided to calculate correlations in smaller sliding windows of scaffolds. Here are the results for correlations over sets of 15 scaffolds, beginning with the first one (as ordered on the joint map from both parents). Notice that these are mostly positive, but do become negative for a few windows in the region of interest (and at the ends of the LG, but I suspect reflects poor ordering of scaffolds at the edges). However, these are not significant. Thus, there is a definite signal here consistent with what we are looking for, but its a bit weak. Attached is the table of correlations and p-values for 15 scaffold windows and a plot of these values along the LG (here the x-axis is the beginning of the 15 scaffold window). Note also, that the results are sensitive to window size, more than 15 and you start to lose it and much fewer than 15 and you get more noise.
The relevant files are in /pscratch/A01963476/archprojects/timema_lgs/v2/. The correlations are in orderCorrelations.csv. R code from makelgMf.R is below:
## read data = recombinationr rates and quality
nl0<-3462475
recombP0FA<-matrix(scan("lgs_recombOutP0FA.txt",n=8*nl0,sep=" "),nrow=nl0,ncol=8,byrow=T)
nl1<-4482398
recombP1FA<-matrix(scan("lgs_recombOutP1FA.txt",n=8*nl1,sep=" "),nrow=nl1,ncol=8,byrow=T)
#load("lg.rwsp")
## identify and select the high quality subset of the data, Mendealian two-tailed prob. of 0.05, and 50 indviduals
hqr<-which(recombP0FA[,7] > 0.005 & recombP0FA[,6] >= 25)
hqdata0<-recombP0FA[hqr,]
hqr<-which(recombP1FA[,7] > 0.005 & recombP1FA[,6] >= 25)
hqdata1<-recombP1FA[hqr,]
## orientate recombination fractions by folding
hqdata0[hqdata0[,8] > 0.5,8]<-1-hqdata0[hqdata0[,8] > 0.5,8]
hqdata1[hqdata1[,8] > 0.5,8]<-1-hqdata1[hqdata1[,8] > 0.5,8]
## lg
lg0<-(which(hqdata0[,1]==8))
lg1<-(which(hqdata1[,1]==8))
scafs0<-unique(c(hqdata0[lg0,2],hqdata0[lg0,4]))
scafs1<-unique(c(hqdata1[lg1,2],hqdata1[lg1,4]))
hqdata<-hqdata1
lg<-lg1
## mean recomb. for scaffolds, create pairwise matrix
slist<-unique(c(hqdata[lg,2],hqdata[lg,4]))
l<-length(slist)
dmat<-matrix(0,nrow=l,ncol=l)
dmatmin<-matrix(0.5,nrow=l,ncol=l)
nmat<-matrix(0,nrow=l,ncol=l)
N<-dim(hqdata[lg,])[1]
for(n in 1:N){
cat(n,"\n")
a<-which(slist==hqdata[lg[n],2])
b<-which(slist==hqdata[lg[n],4])
nmat[a,b]<-nmat[a,b] + 1
dmat[a,b]<-dmat[a,b] + hqdata[lg[n],8]
if (dmatmin[a,b] > hqdata[lg[n],8]){
dmatmin[a,b] <- hqdata[lg[n],8]
}
}
dmatf<-dmat/nmat
for(i in 1:l){
for(j in i:l){
dmatf[j,i]<-dmatf[i,j]
dmatmin[j,i]<-dmatmin[i,j]
}
}
dmatf2<-dmatf ## replace nan with 0.5, this is conservative
dmatf2[is.nan(dmatf)]<-0.5
dmatmin[is.na(dmatmin)]<-0.5
source("ug.R")
out1<-UGlinkage(dmatmin)
lgs<-lgs<-read.table("lg12.txt")## 12 really lg 8
mfr<-cbind(lgs,lgs)
for(i in 1:dim(lgs)[1]){
ord<-which(scafs0[out0$order]==lgs[i,1])
if(length(ord)==0){ord<-NA}
mfr[i,3]<-ord
ord<-which(scafs1[out1$order]==lgs[i,1])
if(length(ord)==0){ord<-NA}
mfr[i,4]<-ord
}
nl<-128
cors<-matrix(NA,nrow=nl,ncol=4)
N<-15
for(i in 1:(nl-N)){
x<-i:(i+N)
out<-cor.test(mfr[x,3],mfr[x,4],rm.na=TRUE)
cors[i,]<-c(x[1],x[N],out$estimate,out$p.value)
}
colnames(cors)<-c("scaf_lb","scaf_ub","r","p")
write.table(cors[1:113,],"orderCorrelations.csv",row.names=FALSE,col.names=FALSE,quote=FALSE,sep=",")