Selection in FHA based on six genotype model

Post date: Mar 12, 2017 4:40:21 AM

I used PCA with mel and stripey combined to assign genotypes. This was done with k-means clustering based on PC1 and PC2. pca6.pdf (in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_fluct/genomic_change_dark_morph/sel_fha) shows the PCA wich numbers indicating clusters (1-6) and colors denoting phontype: green = green, black = striped, brown = dark/melanistic, and gray = unknown (i.e., bugs from 2011 Ecology Letters experiment). K-means clustering essentially found the clusters the human eye/brain finds and the phenotypic data are mostly concordant with these. Based on this I defined the following genotypes (using the cluster numbers): 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s. Here the letters denote the haplotypes (m = melanic, g = green unstriped, s = green striped).

From this I calculated haplotype and genotype frequencies.

Haplotype frequencies

2011: m = 0.317, s = 0.594, g = 0.089

2013: m = 0.359, s = 0.567, g = 0.074

Gentoype frequencies and expections

2011:

obs hw obs-hw

m/g 0.050 0.056426 -0.006426

s/s 0.286 0.352836 -0.066836

g/g 0.006 0.007921 -0.001921

m/m 0.042 0.100489 -0.058489

m/s 0.500 0.376596 0.123404

g/s 0.116 0.105732 0.010268

2013:

obs hw obs-hw

m/g 0.044850498 0.053045772 -0.008195274

s/s 0.272425249 0.321801774 -0.049376525

g/g 0.004983389 0.005464206 -0.000480817

m/m 0.088039867 0.128740301 -0.040700434

m/s 0.496677741 0.407081600 0.089596141

g/s 0.093023256 0.083866348 0.009156908

So, m goes up between 2011 and 2013 (we knew this) and both g and s go down (s more in an absolute sense, but it was way more common). In terms of genotype frequencies, we see a clear excess of m/s in both years (a bit more in 2011) and a small excess of g/s (interesting, but probably not big enough to spend much time thinking about). We see a clear deficit of the homozygous genotypes, particulary m/m and s/s (the two common ones... m/g and g/g are sufficiently rare that I wouldn't think about them too much).

Now, for the ABC model for selection. The only difference from the previous model is that we have six genotypes and different fitness schemes. Here are the relative fitness values:

m/s = 1 (this is the reference for everything else)

m/m = 1 + s1

s/s = 1 + s2

s/g = 1 + s2 + s3 * s4

g/g = 1 + s2 + s3

m/g = 1 + s1 + s3 * s4

Thus, s1 and s2 define the fitness value of the m/m and s/s homozygote in a way that allows for any form of dominance (now positive means increased fitness, negative decreased). s3 defines the fitness of g/g relative to s/s (i.e., after adding s2). The s/g het is 1 + s2 + s3 * s4, thus s4 is the heterozygous effect. Similarly, m/g is 1 + s1 + s3 * s4.

Here is a summary of the posterior for the indiviual selection coefficients:

s1 s2 s3 s4

Weighted 2.5 % Perc.: -0.4822 -0.4653 -0.4173 0.0184

Weighted Median: -0.2829 -0.2753 0.0411 0.3935

Weighted Mean: -0.2558 -0.2625 0.0383 0.4296

Weighted Mode: -0.3182 -0.3129 0.0494 0.1460

Weighted 97.5 % Perc.: 0.1271 0.0190 0.4694 0.9661

Note that we have pretty good evidence of s2 negative (in simplest terms, that s/s does worse than m/s, though this also affects other genotypes).

Perhaps a clearer way to look at this is with posteriors for the fitnesses of each genotype (note that m/s is 1 by definition as these are all relative). Here they are (quantiles, thus 50% is median, next two define 90% CI, then 95% CI; the last one is the pp that the genotype has a lower fitness than m/s):

m/g s/s g/g m/m m/s g/s

50% 0.7505340 0.7245956 0.7783205 0.7182501 1 0.7467703

5% 0.5245322 0.5580484 0.3832951 0.5311901 1 0.5522698

95% 1.0841770 0.9606847 1.1671173 1.0602988 1 1.0036136

2.5% 0.4884055 0.5366053 0.3159947 0.5168979 1 0.5120716

97.5% 1.1557701 1.0175914 1.2379116 1.1281739 1 1.0679673

pp<1 0.9038 0.9686 0.8114 0.9224 NA 0.9474

So, the general signal is that m/s is probably the best, and almost certainly beats s/s and g/s, and probably m/m. We are a bit less certain about m/g and m/m, but the latter in particular is very rare. If you focus on the point estimates, our results suggest that all other genotypes have roughly 3/4 the fitness of m/s. Thus, our data suggest selection is strong. fha_fitness.pdf shows the posteriors for the fitnesses.

Here is the R code (fhaAbc6g.R in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_fluct/genomic_change_dark_morph/sel_fha)

N0<-500

N1<-602

L<-175918

g0<-matrix(scan("pntest_fha2011.txt",n=N0*L,sep=" "),nrow=L,ncol=N0,byrow=TRUE)

g1<-matrix(scan("pntest_fha2013.txt",n=N1*L,sep=" "),nrow=L,ncol=N1,byrow=TRUE)

ids<-read.table("../popgen_fha/snpids.txt",header=F)

cdat<-read.table("fha2013pheno.txt",header=TRUE)

ph<-cdat$color2

ph[cdat$color2==0]<-2+cdat$stripe2[cdat$color2==0]

## define mel and stripy

mlb<-4139489

mub<-6414835

slb<-6414836

mel<-c(which(ids[,2]==702.1 & ids[,3] < mlb),

which(ids[,2]==128 & ids[,3] < mub))

stripy<-which(ids[,2]==128 & ids[,3] >= slb)

g0x<-g0[c(mel,stripy),]

g1x<-g1[c(mel,stripy),]

pcall<-prcomp(rbind(t(g0x),t(g1x)),center=TRUE,scale=FALSE)

oall<-kmeans(pcall$x[,1:2],centers=6,nstart=100,iter.max=200)

pdf("pca6.pdf",width=5,height=5)

plot(pcall$x[,1],pcall$x[,2],type="n")

text(pcall$x[,1],pcall$x[,2],oall$cluster,col=c(rep("gray",500),c("brown","green","black")[ph]))

dev.off()

## groups based on pca

## 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s

gen0<-oall$cluster[1:500]

gen1<-oall$cluster[-c(1:500)]

gfr0<-table(gen0)/500

gfr1<-table(gen1)/602

m0<-(sum(gen0==4) + 0.5 * sum(gen0==1 | gen0==5))/(length(gen0))

m1<-(sum(gen1==4) + 0.5 * sum(gen1==1 | gen1==5))/(length(gen1))

s0<-(sum(gen0==2) + 0.5 * sum(gen0==5 | gen0==6))/(length(gen0))

s1<-(sum(gen1==2) + 0.5 * sum(gen1==5 | gen1==6))/(length(gen1))

r0<-(sum(gen0==3) + 0.5 * sum(gen0==1 | gen0==6))/(length(gen0))

r1<-(sum(gen1==3) + 0.5 * sum(gen1==1 | gen1==6))/(length(gen1))

p0<-c(m0,s0,r0)

p1<-c(m1,s1,r1)

hw0<-c(2*m0*r0,s0^2,r0^2,m0^2,2*m0*s0,2*r0*s0)

hw1<-c(2*m1*r1,s1^2,r1^2,m1^2,2*m1*s1,2*r1*s1)

sum0<-cbind(gfr0,hw0,gfr0-hw0)

rownames(sum0)<-c("m/g","s/s","g/g","m/m","m/s","g/s")

colnames(sum0)<-c("obs","hw","obs-hw")

sum1<-cbind(gfr1,hw1,gfr1-hw1)

rownames(sum1)<-c("m/g","s/s","g/g","m/m","m/s","g/s")

colnames(sum1)<-c("obs","hw","obs-hw")

## abc analysis

onesim<-function(W,Gfreq,Ne){

## m at start of next gen

m<-Gfreq[4] + 0.5 * (Gfreq[1] + Gfreq[5])

s<-Gfreq[2] + 0.5 * (Gfreq[5] + Gfreq[6])

g<-1-(m+s)

## starting genotype frequencies

Gfreq<-c(2*m*g,s^2,g^2,m^2,2*m*s,2*g*s)

## post selection expected gen freqs

Gfreq <-Gfreq * W/wbar

## actual

Gcnts<-rmultinom(n=1,size=Ne,prob=Gfreq)

Gfreq<-Gcnts/Ne

## m at start of final gen

m<-Gfreq[4] + 0.5 * (Gfreq[1] + Gfreq[5])

s<-Gfreq[2] + 0.5 * (Gfreq[5] + Gfreq[6])

g<-1-(m+s)

## starting genotype frequencies

Gfreq<-c(2*m*g,s^2,g^2,m^2,2*m*s,2*g*s)

## post selection expected gen freqs

Gfreq <-Gfreq * W/wbar

## actual

Gcnts<-rmultinom(n=1,size=Ne,prob=Gfreq)

Gfreq<-Gcnts/Ne

return(t(Gfreq))

}

## groups based on pca

## 1 = m/g, 2 = s/s, 3 = g/g, 4 = m/m, 5 = m/s, 6 = g/s

abcloop<-function(N=1000,Ne=200,Gfreq=Gfr0,lb=-0.5,ub=0.5){

simfr<-matrix(NA,nrow=N,ncol=6)

s<-matrix(NA,nrow=N,ncol=4)

for(x in 1:N){

## defines m/m and s/s and offset of g/g relative to s/s

s[x,1:3]<-runif(n=3,min=lb,max=ub)

s[x,4]<-runif(n=1,0,1) ## defines s/g relative to g/g and s/s

W<-c(1+s[x,1]+s[x,3]*s[x,4],1+s[x,2],1+s[x,2]+s[x,3],1+s[x,1],1,1+s[x,2]+s[x,3]*s[x,4])

simfr[x,]<-onesim(W=W,Gfreq=Gfreq,Ne=Ne)

}

out<-list(s,simfr)

return(out)

}

oo<-abcloop(N=1000000,Ne=110.3,Gfreq=gfr0,lb=-0.5,ub=0.5)

aout<-abc(target=as.vector(gfr1),param=as.matrix(oo[[1]]),sumstat=as.matrix(oo[[2]]),tol=0.005,method="ridge")

summary(aout)

wout<-matrix(NA,nrow=5000,ncol=6)

wout[,1]<-1+aout$adj.values[,1] + aout$adj.values[,3] * aout$adj.values[,4]

wout[,2]<-1+aout$adj.values[,2]

wout[,3]<-1+aout$adj.values[,2] + aout$adj.values[,3]

wout[,4]<-1+aout$adj.values[,1]

wout[,5]<-1

wout[,6]<-1+aout$adj.values[,2] + aout$adj.values[,3] * aout$adj.values[,4]

## density plots

cs<-c("green","darkgray","green","brown","black","darkgray")

ll<-c(2,1,1,1,2,2)

pdf("fha_fitness.pdf",width=6,height=6)

par(mar=c(5,5,0.5,0.5))

plot(density(wout[,2]),xlim=c(0,1.8),ylim=c(0,4),col=cs[2],lty=ll[2],xlab="relative fitness",cex.lab=1.4,cex.axis=1.1,main="")

for(i in c(1,3,4,6)){

lines(density(wout[,i]),col=cs[i],lty=ll[i])

}

abline(v=1,col=cs[5],lty=ll[5],lwd=2)

legend(0,4,c("m/g","s/s","g/g","m/m","m/s","g/s"),col=cs,lty=ll)

dev.off()

Page updated

Google Sites

Report abuse