Permutation tests related to the inversion

Post date: Nov 21, 2015 5:53:24 AM

I have started (re)running analyses related to the inversion. First, I asked whether differences in recombination between the male and female for the big family were greater for pairs of scaffolds crossing vs. not-crossing the inversion. I defined the inversion based on the SV data (bounds = 58/59 to 66/67 where numbers denote scaffold order on LG8; this is the same as before), the region of truly elevated Fst in FHA (bounds = 19/22 to 64/65; entire region bounded by Fst > 0.2, which is arbitrary but captures the obvious graphical signal) and the elevated Fst region in N1 (bounds 19/22 to 64/65; identical to FHA so really this is just one test). Note that for Fst I used R.GIS (we could repeat this with GxS but that is not the contrast for the mapping family, so this would be hard to interpret/justify). Here are the results (the statistic of interest for the permutation test is the mean difference in male vs. female recombination rates for scaffolds spanning vs. within the inversion):

SV, Obs. = 0.0396, Null (mean) = -9.7e-5, Null (95th) = 0.0135, p < 0.001

Fst FHA or N1, Obs. = 0.00988, Null (mean) = 0.000142, Null (95th) = 0.00715, p = 0.017

I then asked whether Fst was elevated for LG8 within the inversion defined by SV. I did this for R.GIS for FHA and N1. Here are the results from the permutation tests (the statistic here is mean Fst for the inversion based on SV):

FHA, Obs. = 0.273, Null (mean) = 0.0579, Null (95th) = 0.148, p < 0.001

N1, Obs. = 0.303, Null (mean) = 0.0825, Null (95th) = 0.175, p < 0.001

Clearly, these are strong and consistent results.

Doro has already made some comparisons of Fst and LD, but I don't think these involve the SV-defined inversion (I can do this next if we want it). Other than that, is there another contrast we wanted that I am forgetting (I feel like there is, but this already seems quite solid).

Here is the R code (from calclgrecomb.R in /labs/evolution/projects/timema_speciation/timemaRecomb; some commenting/uncomment is required for different bounds and Fst values, and the code below excludes stuff I used for a few earlier plots):

## read data = recombinationr rates and quality

nl0<-3462475

recombP0FA<-matrix(scan("lgs_recombOutP0FA.txt",n=8*nl0,sep=" "),nrow=nl0,ncol=8,byrow=T)

nl1<-4482398

recombP1FA<-matrix(scan("lgs_recombOutP1FA.txt",n=8*nl1,sep=" "),nrow=nl1,ncol=8,byrow=T)

#load("lg.rwsp")

## identify and select the high quality subset of the data, Mendealian two-tailed prob. of 0.05, and 50 indviduals

hqr<-which(recombP0FA[,7] > 0.05 & recombP0FA[,6] >= 50)

hqdata0<-recombP0FA[hqr,]

hqr<-which(recombP1FA[,7] > 0.05 & recombP1FA[,6] >= 50)

hqdata1<-recombP1FA[hqr,]

## orientate recombination fractions by folding

hqdata0[hqdata0[,8] > 0.5,8]<-1-hqdata0[hqdata0[,8] > 0.5,8]

hqdata1[hqdata1[,8] > 0.5,8]<-1-hqdata1[hqdata1[,8] > 0.5,8]

## lg

lg0<-(which(hqdata0[,1]==8))

lg1<-(which(hqdata1[,1]==8))

## mean recomb. for scaffolds

slist<-unique(rbind(hqdata0[lg0,c(2,4)],hqdata1[lg1,c(2,4)]))

l<-dim(slist)[1]

dmat0<-rep(0,l)

dmatmin0<-rep(0.5,l)

nmat0<-rep(0,l)

dmat1<-rep(0,l)

dmatmin1<-rep(0.5,l)

nmat1<-rep(0,l)

N<-dim(hqdata0[lg0,])[1]

for(n in 1:N){

cat(n,"\n")

a<-which(slist[,1]==hqdata0[lg0[n],2] & slist[,2]==hqdata0[lg0[n],4])

nmat0[a]<-nmat0[a] + 1

dmat0[a]<-dmat0[a] + hqdata0[lg0[n],8]

if (dmatmin0[a] > hqdata0[lg0[n],8]){

dmatmin0[a] <- hqdata0[lg0[n],8]

}

dmatf0<-dmat0/nmat0

N<-dim(hqdata1[lg1,])[1]

for(n in 1:N){

cat(n,"\n")

a<-which(slist[,1]==hqdata1[lg1[n],2] & slist[,2]==hqdata1[lg1[n],4])

nmat1[a]<-nmat1[a] + 1

dmat1[a]<-dmat1[a] + hqdata1[lg1[n],8]

if (dmatmin1[a] > hqdata1[lg1[n],8]){

dmatmin1[a] <- hqdata1[lg1[n],8]

}

dmatf1<-dmat1/nmat1

## increased difference across inversion

bnd1<-c(58,59) # sv

#bnd1<-c(19,22) # fha or n1 fst

bnd2<-c(66,67) # sv

#bnd2<-c(64,65) # fha or n1 fst

fst<-read.table("HudsonFst_s090p001_binnedSNPs_FHA_N1_morphshosts.txt",header=TRUE)

flg<-which(fst$lg==8)

fscaf<-tapply(X=fst$N1.R.GIS[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)

#fscaf<-tapply(X=fst$FHA.R.GIS[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)

scafs<-tapply(X=fst$scaf[flg],INDEX=fst$ord[flg],mean,na.rm=TRUE)

ords<-as.numeric(names(fscaf))

#fst<-read.table("HudsonFst_allSNPsLG8scaf_greens.txt",header=TRUE)

invcr<-matrix(NA,nrow=l,ncol=2)

for(n in 1:dim(slist)[1]){

a<-which(scafs==slist[n,1])

if(length(a)==1){invcr[n,1]<-ords[a]}

a<-which(scafs==slist[n,2])

if(length(a)==1){invcr[n,2]<-ords[a]}

}

invcrbool<-rep(NA,l)

cr<-c(which(invcr[,1] < bnd1[1] & invcr[,2] > bnd1[2] & invcr[,2] < bnd2[1]),which(invcr[,1] > bnd1[2] & invcr[,1] < bnd2[1] & invcr[,2] > bnd2[2]))

wi<-which(invcr[,2] < bnd1[1] | invcr[,1] > bnd2[2] | (invcr[,1] > bnd1[2] & invcr[,2] < bnd2[1]))

dif<-abs(dmatf0-dmatf1)

ts<-mean(dif[cr],na.rm=T)-mean(dif[wi],na.rm=T)

null<-rep(NA,1000)

for(i in 1:1000){

L<-c(cr,wi)

x<-sample(1:length(L),length(cr),replace=F)

null[i]<-mean(dif[L[x]],na.rm=T)-mean(dif[L[-x]],na.rm=T)

}

pdf("recombInvSv.pdf",width=5,height=5)

hist(null,xlim=c(-0.03,0.035),xlab="mean dif. in male vs. female R spanning vs. within inv.")

abline(v=ts)

dev.off()

### fst by sv inversion

inv<-which(ords >= bnd1[2] & ords <= bnd2[1])

ts<-mean(fscaf[inv])

null<-rep(NA,1000)

for(i in 1:1000){

inv<-sample(1:length(ords),length(inv),replace=FALSE)

null[i]<-mean(fscaf[inv])

}

Page updated

Google Sites

Report abuse