Post date: Dec 05, 2019 8:28:14 PM
I have results for scaffold 128 for coverage for the ecology letters whole genomes.
I summarized things by calculating, for each individual, mean coverage in 1 mbp windows (15 windows) along scaffold 128. I then compared these across windows. Look at depthZoom.pdf first. On the x-axis is the mean coverage for each individual in the 1-2 mbp window and on the y is the same thing for 5-6 mbp (the color locus/deletion). Points are colored based on genotype (S/G includes a few G/G). The solid line is the 1:1 expectation, and the dashed lines denote 1:2 or 2:1. This mostly looks nice, about twice the coverage in M/M as S/S or S/G for 1-2 mbp compared to 5-6 mbp, with hets. in between. This is consistent with an extra copy in M compared to S or G. It is not really consistent with deletion of a single copy region as coverage is to high in S/S and S/G genetoypes. On the other hand, note that M/M is almost on the 1:1 line, thus, we don't have twice the coverage for 5-6 mbp in M/M as we do for 1-2 mbp. This suggests that if 5-6 is duplicated so is 1-2.
Now, turn to the mess that is depthPlots.pdf. This is the same style plot for all pairs of windows. The numbers on top give the windows (they refer to the upper bound in mbp, so '6' is the deletion). The first one is on the x and the second on the y.
A few things stand out.
-you only see coverage differences among clusters for window 6 (the deletion window).
-There is some variation beyond that in coverage among windows. Most is minor.
- But there is some evidence that some of the second half of the scaffold (starting after the deletion) has higher coverage on average than the windows before the deletion (1-5). This is most pronounced for window 15. Still not quite the 2:1 or 1:2 expectation, but definitely an uptick. This is sort of consistent with a large duplication starting at the deletion (and including it for melanic but not for the others) but with some noise complexity. Going to keep digging.
Here is what I did:
snps<-as.matrix(read.table("snpids.txt"))
lg8<-snps[which(snps[,1]==8),]
scaf128<-which(lg8[,2]==128)
indel<-which(lg8[,2]==128 & lg8[,3] > 5000000 & lg8[,3] < 6000000)
mlb<-4139489
mub<-6414835
mel<-c(which(lg8[,2]==702.1 & lg8[,3] < mlb),which(lg8[,2]==128 & lg8[,3] < mub))
L<-540910
N<-491
G<-matrix(scan("pntest_ecol_all.txt",n=N*L,sep=" "),nrow=L,ncol=N,byrow=TRUE)
pcmel<-prcomp(t(G[mel,]),center=TRUE,scale=FALSE)
#####
Ggbs<-matrix(scan("../sel_fha/pntest_fha2011.txt",n=500*175918,sep=" "),nrow=175918,ncol=500,byrow=TRUE)
snpsGbs<-read.table("../popgen_fha/snpids.txt",header=F)
melGbs<-c(which(snpsGbs[,2]==702.1 & snpsGbs[,3] < mlb),which(snpsGbs[,2]==128 & snpsGbs[,3] < mub))
pcGbs<-prcomp(t(Ggbs[melGbs,]),center=TRUE,scale=FALSE)
ids<-scan("../variants/ecolids.txt")
idsGbs<-scan("../variants/fha2011ids.txt")
bamids<-scan("/uufs/chpc.utah.edu/common/home/u6000989/data/timema/combind_wgs_dovetailV3/alignments_ecolexp/bamids.txt")
keep<-which(idsGbs %in% ids)
plot(pcmel$x[,1], pcGbs$x[keep,1])
cor.test(pcmel$x[,1], pcGbs$x[keep,1])
# Pearson's product-moment correlation
#data: pcmel$x[, 1] and pcGbs$x[keep, 1]
#t = 51.462, df = 489, p-value < 2.2e-16
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# 0.9037538 0.9315230
#sample estimates:
# cor
#0.9187673
cor.test(pcmel$x[,2], -1*pcGbs$x[keep,2])
# Pearson's product-moment correlation
#data: pcmel$x[, 2] and -1 * pcGbs$x[keep, 2]
#t = 4.2612, df = 489, p-value = 2.441e-05
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# 0.1024391 0.2731332
#sample estimates:
# cor
#0.1892152
## read in coverage
covL<-15157215
N<-491
library(data.table)
covmat<-fread("/uufs/chpc.utah.edu/common/home/u6000989/data/timema/combind_wgs_dovetailV3/alignments_ecolexp/cov128_mat.txt",nrows=15157215,header=FALSE,showProgress=TRUE)
save(list=ls(),file="covDat.rdat")
pos<-scan("/uufs/chpc.utah.edu/common/home/u6000989/data/timema/combind_wgs_dovetailV3/alignments_ecolexp/cov128_positions.txt")
## collapse by 5-6.4 vs. rest
cloc<-which(pos > 5000000 & pos < 6400000)
covdel<-apply(covmat[cloc,],2,mean)
covwin<-matrix(NA,nrow=15,ncol=2028)
for(i in 1:15){
cloc<-which(pos > ((i-1) * 1000000) & pos < i*1000000)
covwin[i,]<-apply(covmat[cloc,],2,sum)/1000000
}
## collapse by cluster
csl<-rep(NA,length(bamids))
for(i in 1:2028){
csl[i]<-o$cluster[which(idsGbs==bamids[i])]
}
pdf("depthZoom.pdf",width=5,height=5)
par(mar=c(5,5,.5,.5))
cs<-alpha(c("antiquewhite4","chocolate","black","darkgreen","cornflowerblue","antiquewhite4"),.5)
plot(covwin[2,],covwin[6,],col=cs[csl],pch=20,xlab="Depth (1 to 2 mbp)",ylab="Depth (5 to 6 mbp)",cex.lab=1.5,cex.axis=1.1)
legend(0.07,0.62,c("M/M","M/S","M/G","S/S","S/G"),pch=20,col=cs[c(3,2,5,1,4)],bty='n')
abline(a=0,b=1)
abline(a=0,b=.5,lty=2)
abline(a=0,b=2,lty=2)
dev.off()
pdf("depthPlots.pdf",width=12,height=12)
par(mfrow=c(5,5))
par(mar=c(4,5,2,1))
for(i in 1:14){for(j in (i+1):15){
plot(covwin[i,],covwin[j,],col=cs[csl],pch=20,xlab="Depth",ylab="Depth",cex.lab=1.5,cex.axis=1.1)
title(main=paste(i,"vs",j))
abline(a=0,b=1)
abline(a=0,b=.5,lty=2)
abline(a=0,b=2,lty=2)
}}
dev.off()