Post date: Mar 12, 2017 4:40:21 AM
I used PCA with mel and stripey combined to assign genotypes. This was done with k-means clustering based on PC1 and PC2. pca6.pdf (in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_fluct/genomic_change_dark_morph/sel_fha) shows the PCA wich numbers indicating clusters (1-6) and colors denoting phontype: green = green, black = striped, brown = dark/melanistic, and gray = unknown (i.e., bugs from 2011 Ecology Letters experiment). K-means clustering essentially found the clusters the human eye/brain finds and the phenotypic data are mostly concordant with these. Based on this I defined the following genotypes (using the cluster numbers): 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s. Here the letters denote the haplotypes (m = melanic, g = green unstriped, s = green striped).
From this I calculated haplotype and genotype frequencies.
Haplotype frequencies
2011: m = 0.317, s = 0.594, g = 0.089
2013: m = 0.359, s = 0.567, g = 0.074
Gentoype frequencies and expections
2011:
obs hw obs-hw
m/g 0.050 0.056426 -0.006426
s/s 0.286 0.352836 -0.066836
g/g 0.006 0.007921 -0.001921
m/m 0.042 0.100489 -0.058489
m/s 0.500 0.376596 0.123404
g/s 0.116 0.105732 0.010268
2013:
obs hw obs-hw
m/g 0.044850498 0.053045772 -0.008195274
s/s 0.272425249 0.321801774 -0.049376525
g/g 0.004983389 0.005464206 -0.000480817
m/m 0.088039867 0.128740301 -0.040700434
m/s 0.496677741 0.407081600 0.089596141
g/s 0.093023256 0.083866348 0.009156908
So, m goes up between 2011 and 2013 (we knew this) and both g and s go down (s more in an absolute sense, but it was way more common). In terms of genotype frequencies, we see a clear excess of m/s in both years (a bit more in 2011) and a small excess of g/s (interesting, but probably not big enough to spend much time thinking about). We see a clear deficit of the homozygous genotypes, particulary m/m and s/s (the two common ones... m/g and g/g are sufficiently rare that I wouldn't think about them too much).
Now, for the ABC model for selection. The only difference from the previous model is that we have six genotypes and different fitness schemes. Here are the relative fitness values:
m/s = 1 (this is the reference for everything else)
m/m = 1 + s1
s/s = 1 + s2
s/g = 1 + s2 + s3 * s4
g/g = 1 + s2 + s3
m/g = 1 + s1 + s3 * s4
Thus, s1 and s2 define the fitness value of the m/m and s/s homozygote in a way that allows for any form of dominance (now positive means increased fitness, negative decreased). s3 defines the fitness of g/g relative to s/s (i.e., after adding s2). The s/g het is 1 + s2 + s3 * s4, thus s4 is the heterozygous effect. Similarly, m/g is 1 + s1 + s3 * s4.
Here is a summary of the posterior for the indiviual selection coefficients:
s1 s2 s3 s4
Weighted 2.5 % Perc.: -0.4822 -0.4653 -0.4173 0.0184
Weighted Median: -0.2829 -0.2753 0.0411 0.3935
Weighted Mean: -0.2558 -0.2625 0.0383 0.4296
Weighted Mode: -0.3182 -0.3129 0.0494 0.1460
Weighted 97.5 % Perc.: 0.1271 0.0190 0.4694 0.9661
Note that we have pretty good evidence of s2 negative (in simplest terms, that s/s does worse than m/s, though this also affects other genotypes).
Perhaps a clearer way to look at this is with posteriors for the fitnesses of each genotype (note that m/s is 1 by definition as these are all relative). Here they are (quantiles, thus 50% is median, next two define 90% CI, then 95% CI; the last one is the pp that the genotype has a lower fitness than m/s):
m/g s/s g/g m/m m/s g/s
50% 0.7505340 0.7245956 0.7783205 0.7182501 1 0.7467703
5% 0.5245322 0.5580484 0.3832951 0.5311901 1 0.5522698
95% 1.0841770 0.9606847 1.1671173 1.0602988 1 1.0036136
2.5% 0.4884055 0.5366053 0.3159947 0.5168979 1 0.5120716
97.5% 1.1557701 1.0175914 1.2379116 1.1281739 1 1.0679673
pp<1 0.9038 0.9686 0.8114 0.9224 NA 0.9474
So, the general signal is that m/s is probably the best, and almost certainly beats s/s and g/s, and probably m/m. We are a bit less certain about m/g and m/m, but the latter in particular is very rare. If you focus on the point estimates, our results suggest that all other genotypes have roughly 3/4 the fitness of m/s. Thus, our data suggest selection is strong. fha_fitness.pdf shows the posteriors for the fitnesses.
Here is the R code (fhaAbc6g.R in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_fluct/genomic_change_dark_morph/sel_fha)
N0<-500
N1<-602
L<-175918
g0<-matrix(scan("pntest_fha2011.txt",n=N0*L,sep=" "),nrow=L,ncol=N0,byrow=TRUE)
g1<-matrix(scan("pntest_fha2013.txt",n=N1*L,sep=" "),nrow=L,ncol=N1,byrow=TRUE)
ids<-read.table("../popgen_fha/snpids.txt",header=F)
cdat<-read.table("fha2013pheno.txt",header=TRUE)
ph<-cdat$color2
ph[cdat$color2==0]<-2+cdat$stripe2[cdat$color2==0]
## define mel and stripy
mlb<-4139489
mub<-6414835
slb<-6414836
mel<-c(which(ids[,2]==702.1 & ids[,3] < mlb),
which(ids[,2]==128 & ids[,3] < mub))
stripy<-which(ids[,2]==128 & ids[,3] >= slb)
##
g0x<-g0[c(mel,stripy),]
g1x<-g1[c(mel,stripy),]
pcall<-prcomp(rbind(t(g0x),t(g1x)),center=TRUE,scale=FALSE)
oall<-kmeans(pcall$x[,1:2],centers=6,nstart=100,iter.max=200)
pdf("pca6.pdf",width=5,height=5)
plot(pcall$x[,1],pcall$x[,2],type="n")
text(pcall$x[,1],pcall$x[,2],oall$cluster,col=c(rep("gray",500),c("brown","green","black")[ph]))
dev.off()
## groups based on pca
## 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s
gen0<-oall$cluster[1:500]
gen1<-oall$cluster[-c(1:500)]
gfr0<-table(gen0)/500
gfr1<-table(gen1)/602
m0<-(sum(gen0==4) + 0.5 * sum(gen0==1 | gen0==5))/(length(gen0))
m1<-(sum(gen1==4) + 0.5 * sum(gen1==1 | gen1==5))/(length(gen1))
s0<-(sum(gen0==2) + 0.5 * sum(gen0==5 | gen0==6))/(length(gen0))
s1<-(sum(gen1==2) + 0.5 * sum(gen1==5 | gen1==6))/(length(gen1))
r0<-(sum(gen0==3) + 0.5 * sum(gen0==1 | gen0==6))/(length(gen0))
r1<-(sum(gen1==3) + 0.5 * sum(gen1==1 | gen1==6))/(length(gen1))
p0<-c(m0,s0,r0)
p1<-c(m1,s1,r1)
hw0<-c(2*m0*r0,s0^2,r0^2,m0^2,2*m0*s0,2*r0*s0)
hw1<-c(2*m1*r1,s1^2,r1^2,m1^2,2*m1*s1,2*r1*s1)
sum0<-cbind(gfr0,hw0,gfr0-hw0)
rownames(sum0)<-c("m/g","s/s","g/g","m/m","m/s","g/s")
colnames(sum0)<-c("obs","hw","obs-hw")
sum1<-cbind(gfr1,hw1,gfr1-hw1)
rownames(sum1)<-c("m/g","s/s","g/g","m/m","m/s","g/s")
colnames(sum1)<-c("obs","hw","obs-hw")
## abc analysis
onesim<-function(W,Gfreq,Ne){
## m at start of next gen
m<-Gfreq[4] + 0.5 * (Gfreq[1] + Gfreq[5])
s<-Gfreq[2] + 0.5 * (Gfreq[5] + Gfreq[6])
g<-1-(m+s)
## starting genotype frequencies
Gfreq<-c(2*m*g,s^2,g^2,m^2,2*m*s,2*g*s)
## post selection expected gen freqs
Gfreq <-Gfreq * W/wbar
## actual
Gcnts<-rmultinom(n=1,size=Ne,prob=Gfreq)
Gfreq<-Gcnts/Ne
## m at start of final gen
m<-Gfreq[4] + 0.5 * (Gfreq[1] + Gfreq[5])
s<-Gfreq[2] + 0.5 * (Gfreq[5] + Gfreq[6])
g<-1-(m+s)
## starting genotype frequencies
Gfreq<-c(2*m*g,s^2,g^2,m^2,2*m*s,2*g*s)
## post selection expected gen freqs
Gfreq <-Gfreq * W/wbar
## actual
Gcnts<-rmultinom(n=1,size=Ne,prob=Gfreq)
Gfreq<-Gcnts/Ne
return(t(Gfreq))
}
## groups based on pca
## 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s
abcloop<-function(N=1000,Ne=200,Gfreq=Gfr0,lb=-0.5,ub=0.5){
simfr<-matrix(NA,nrow=N,ncol=6)
s<-matrix(NA,nrow=N,ncol=4)
for(x in 1:N){
## defines m/m and s/s and offset of g/g relative to s/s
s[x,1:3]<-runif(n=3,min=lb,max=ub)
s[x,4]<-runif(n=1,0,1) ## defines s/g relative to g/g and s/s
W<-c(1+s[x,1]+s[x,3]*s[x,4],1+s[x,2],1+s[x,2]+s[x,3],1+s[x,1],1,1+s[x,2]+s[x,3]*s[x,4])
simfr[x,]<-onesim(W=W,Gfreq=Gfreq,Ne=Ne)
}
out<-list(s,simfr)
return(out)
}
oo<-abcloop(N=1000000,Ne=110.3,Gfreq=gfr0,lb=-0.5,ub=0.5)
aout<-abc(target=as.vector(gfr1),param=as.matrix(oo[[1]]),sumstat=as.matrix(oo[[2]]),tol=0.005,method="ridge")
summary(aout)
wout<-matrix(NA,nrow=5000,ncol=6)
wout[,1]<-1+aout$adj.values[,1] + aout$adj.values[,3] * aout$adj.values[,4]
wout[,2]<-1+aout$adj.values[,2]
wout[,3]<-1+aout$adj.values[,2] + aout$adj.values[,3]
wout[,4]<-1+aout$adj.values[,1]
wout[,5]<-1
wout[,6]<-1+aout$adj.values[,2] + aout$adj.values[,3] * aout$adj.values[,4]
## density plots
cs<-c("green","darkgray","green","brown","black","darkgray")
ll<-c(2,1,1,1,2,2)
pdf("fha_fitness.pdf",width=6,height=6)
par(mar=c(5,5,0.5,0.5))
plot(density(wout[,2]),xlim=c(0,1.8),ylim=c(0,4),col=cs[2],lty=ll[2],xlab="relative fitness",cex.lab=1.4,cex.axis=1.1,main="")
for(i in c(1,3,4,6)){
lines(density(wout[,i]),col=cs[i],lty=ll[i])
}
abline(v=1,col=cs[5],lty=ll[5],lwd=2)
legend(0,4,c("m/g","s/s","g/g","m/m","m/s","g/s"),col=cs,lty=ll)
dev.off()