Post date: Oct 29, 2018 9:26:33 PM
The odd excess of intermediate allele frequencies for the SVs prompted me to look at patterns of observed heterozygosity at these loci. I found that many SVs showed near maximal heterozygosity (no minor allele homozygotes). While this could mean something interesting, the signal is sufficiently strong that I suspect it is some sort of analytical artifact (but I can't imagine what).
1. FIrst, I used two approaches to estimates genotypes/heterozygosity from the population allele frequencies and genotype likelihoods (this was all done for the 20 individuals together, not actually at the population level, but this induces a bias towards low not high heterozygosity).
From king:~/projects/timema_SV/SVs I ran,
perl gl2h.pl timema_SVs_deletions.gl p_timema_SVs_deletions.txt
perl gl2h.pl timema_SVs_dups.gl p_timema_SVs_dups.txt
perl gl2h.pl timema_SVs_inversions.gl p_timema_SVs_inversions.txt
perl gl2genest.pl timema_SVs_deletions.gl p_timema_SVs_deletions.txt
perl gl2genest.pl timema_SVs_dups.gl p_timema_SVs_dups.txt
perl gl2genest.pl timema_SVs_inversions.gl p_timema_SVs_inversions.txt
Note that the former just writes the posterior probability of heterozygosity, which is closer to what we want here.
2. In R (hetANalyses.R) I than calculated the observed het, mostly from the posterior probability of heterozygosity, but using empirical Bayes genotype estimates or the raw genotype estimates from the vcf files gave similar results.
## measuring SV heterozygosity
## files with threshold genotypes, only include if called
g_del<-as.matrix(read.table("gen_sv_deletions.txt",header=FALSE))
g_dup<-as.matrix(read.table("gen_sv_dups.txt",header=FALSE))
g_inv<-as.matrix(read.table("gen_sv_inversions.txt",header=FALSE))
## empirical Bayes point estimates from grand MAFs
b_del<-as.matrix(read.table("pntest_timema_SVs_deletions.txt",header=FALSE))
b_dup<-as.matrix(read.table("pntest_timema_SVs_dups.txt",header=FALSE))
b_inv<-as.matrix(read.table("pntest_timema_SVs_inversions.txt",header=FALSE))
## het from gl2h.pl
h_del<-as.matrix(read.table("hest_timema_SVs_deletions.txt",header=FALSE))
h_dup<-as.matrix(read.table("hest_timema_SVs_dups.txt",header=FALSE))
h_inv<-as.matrix(read.table("hest_timema_SVs_inversions.txt",header=FALSE))
## allele freqs.
p_del<-as.matrix(read.table("p_timema_SVs_deletions.txt",header=FALSE))
p_del<-as.numeric(p_del[,3])
p_del[p_del > 0.5]<-1-p_del[p_del > 0.5]
p_dup<-as.matrix(read.table("p_timema_SVs_dups.txt",header=FALSE))
p_dup<-as.numeric(p_dup[,3])
p_dup[p_dup > 0.5]<-1-p_dup[p_dup > 0.5]
p_inv<-as.matrix(read.table("p_timema_SVs_inversions.txt",header=FALSE))
p_inv<-as.numeric(p_inv[,3])
p_inv[p_inv > 0.5]<-1-p_inv[p_inv > 0.5]
## hwe
p<-seq(0,.5,0.005)
he<-p*(1-p)*2
pdf("svhet.pdf",width=8,height=8)
par(mfrow=c(2,2))
par(mar=c(4.5,5,2.5,1))
cl<-1.5
cm<-1.25
## inv, dup, del
plot(p_inv,apply(h_inv,1,mean),xlab="Allele frequency",ylab="Expected heterozygosity",cex.lab=cl,pch=19,col="darkgray")
lines(p,he,lwd=1.5)
abline(a=0,b=2,lty=2,lwd=1.5,col="red")
title(main="(a) Inversions, expected het.",cex.main=cm)
legend(0,0.98,c("maximum","HWE"),col=c("red","black"),lty=c(2,1),lwd=1.5,bty='n')
plot(p_dup,apply(h_dup,1,mean),xlab="Allele frequency",ylab="Expected heterozygosity",cex.lab=cl,pch=19,col="darkgray")
lines(p,he,lwd=1.5)
abline(a=0,b=2,lty=2,lwd=1.5,col="red")
title(main="(b) Duplications, expected het.",cex.main=cm)
plot(p_del,apply(h_del,1,mean),xlab="Allele frequency",ylab="Expected heterozygosity",cex.lab=cl,pch=19,col="darkgray")
lines(p,he,lwd=1.5)
abline(a=0,b=2,lty=2,lwd=1.5,col="red")
title(main="(c) Deletions, expected het.",cex.main=cm)
dev.off()