No. of QTN in the indel and mel-stripe, 8 of 10 species

Post date: Nov 19, 2017 3:3:2 AM

I inferred the number of QTN/QTL in the indel and mel-stripe regions (I think the latter includes the former) for 8 of the 10 species (need to download the other two). The files are in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/ and the key R code (below) is indelNqtl.R.

Here are my notes (send to Patrik):

1.

XXX NEED TO ADD THIS XXXX

2.

Here are the results using mel-stripe as defined in the current paper. I am pretty sure I have the orientation figured out (specifically, that the mapping was done before flipping the orientation, this is based on the GWA signal from gemma, which isn't as clean as genabel), but we should of course verify this. Here is what we have:

mean median 2.5th_q 97.5th_q

Tcalifornicum 4.1361 4 1 9

Tchumash 5.6374 6 3 9

Tcristinae 2.0594 2 0 5

Tcuri 1.6234 1 0 4

Tknulli 1.4004 1 0 4

Tlandelsensis 1.9225 2 0 5

Tpetita 1.0031 1 0 3

Tpoppensis 2.0862 2 0 5

This looks pretty good, though perhaps with a trend towards too many QTL/QTN (but still pretty good). With that said, I think for the species with the indel/inversion this is an overestimate. The reason for this is that I sample QTL/QTN based on PIPs for each trait independently, and then ask how many I have. Well, you basically get <= 1 per trait, and there are two traits so this turns into <= 2 (as the chance of sampling the same SNP for the two traits is low if you assume they are independent and given that the PIPs are spread over so many SNPs). To illustrate this, here is the same table for the two traits independently:

mean median 2.5th_q 97.5th_q

Tcalifornicum 1.7860 2 0 5

Tchumash 2.4090 2 1 4

Tcristinae 0.8509 1 0 2

Tcuri 1.0535 1 0 3

Tknulli 0.7613 1 0 3

Tlandelsensis 1.1851 1 0 4

Tpetita 0.4778 0 0 2

Tpoppensis 0.9728 1 0 3

mean median 2.5th_q 97.5th_q

Tcalifornicum 2.3710 2 0 6

Tchumash 3.2603 3 1 6

Tcristinae 1.2224 1 0 4

Tcuri 0.5847 0 0 2

Tknulli 0.6302 0 0 2

Tlandelsensis 0.7463 1 0 3

Tpetita 0.5498 0 0 2

Tpoppensis 1.1259 1 0 4

My next step is to run this with mel-stripe and the indel combined (mel-stripe is bounded by but does not include the indel). This will not solve the issue of how to treat the two traits, but is perhaps the best general approach for defining 'the region'.

3.

Actually, that last bit doesn't seem right. Assuming I am right about the orientation, the indel is the included in mel-stripe (it does bound it, but is also part of it). So, I think we have what we have. I will think more about any nice ways to account for the likely non-independence of the two color phenotypes (they are neither the same thing nor are they likely fully independent, and certainly their signals are not independent once locked in the inversion), but there isn't any obvious way to deal with this that doesn't come across as contrived.

###### R code ######

## read in list of files for RG and GB

files_GB<-read.table("dorGB_files.txt",header=FALSE)

files_RG<-read.table("dorRG_files.txt",header=FALSE)

### based on indel only

mat_ng<-matrix(NA,nrow=8,ncol=4)

mat_ngx<-matrix(NA,nrow=8,ncol=4)

mat_ngy<-matrix(NA,nrow=8,ncol=4)

## loop over 8 species

for(i in 1:8){

mapGB_all<-read.table(as.character(files_GB[i,]),header=TRUE)

mapRG_all<-read.table(as.character(files_RG[i,]),header=TRUE)

## subset indel SNPs

indel<-which(mapGB_all$CHR==8 & mapGB_all$PS >= 5000000 & mapGB_all$PS <= 6000000 & mapGB_all$scaffold == 128)

mapRG<-mapRG_all[indel,]

mapGB<-mapGB_all[indel,]

L<-length(indel)

## summarize posterior

N<-10000

ng<-rep(NA,N)

ngx<-rep(NA,N)

ngy<-rep(NA,N)

for(j in 1:N){

x<-rbinom(n=L,size=1,prob=mapGB$gamma)

y<-rbinom(n=L,size=1,prob=mapRG$gamma)

ng[j]<-sum(apply(cbind(x,y),1,max))

ngx[j]<-sum(x)

ngy[j]<-sum(y)

}

mat_ng[i,]<-c(mean(ng),quantile(ng,probs=c(0.5,0.025,0.975)))

mat_ngx[i,]<-c(mean(ngx),quantile(ngx,probs=c(0.5,0.025,0.975)))

mat_ngy[i,]<-c(mean(ngy),quantile(ngy,probs=c(0.5,0.025,0.975)))

}

ids<-read.table("ids.txt",header=FALSE)

rownames(mat_ng)<-ids[,1]

colnames(mat_ng)<-c("mean","median","2.5th_q","97.5th_q")

### based on mel--stripy

mat_ng_ms<-matrix(NA,nrow=8,ncol=4)

mat_ngx_ms<-matrix(NA,nrow=8,ncol=4)

mat_ngy_ms<-matrix(NA,nrow=8,ncol=4)

## loop over 8 species

for(i in 1:8){

mapGB_all<-read.table(as.character(files_GB[i,]),header=TRUE)

mapRG_all<-read.table(as.character(files_RG[i,]),header=TRUE)

## subset indel SNPs

mlb<-4139489

mub<-6414835

mel<-c(which(mapGB_all$scaffold==702.1 & mapGB_all$PS < mlb),

which(mapGB_all$scaffold==128 & mapGB_all$PS < mub))

mapRG<-mapRG_all[mel,]

mapGB<-mapGB_all[mel,]

L<-length(mel)

## summarize posterior

N<-10000

ng<-rep(NA,N)

ngx<-rep(NA,N)

ngy<-rep(NA,N)

for(j in 1:N){

x<-rbinom(n=L,size=1,prob=mapGB$gamma)

y<-rbinom(n=L,size=1,prob=mapRG$gamma)

ng[j]<-sum(apply(cbind(x,y),1,max))

ngx[j]<-sum(x)

ngy[j]<-sum(y)

}

mat_ng_ms[i,]<-c(mean(ng),quantile(ng,probs=c(0.5,0.025,0.975)))

mat_ngx_ms[i,]<-c(mean(ngx),quantile(ngx,probs=c(0.5,0.025,0.975)))

mat_ngy_ms[i,]<-c(mean(ngy),quantile(ngy,probs=c(0.5,0.025,0.975)))

}

rownames(mat_ng_ms)<-ids[,1]

colnames(mat_ng_ms)<-c("mean","median","2.5th_q","97.5th_q")

rownames(mat_ngx_ms)<-ids[,1]

colnames(mat_ngx_ms)<-c("mean","median","2.5th_q","97.5th_q")

rownames(mat_ngy_ms)<-ids[,1]

colnames(mat_ngy_ms)<-c("mean","median","2.5th_q","97.5th_q")