Post date: Dec 14, 2017 8:2:42 PM
The goal here was to re-examine het-fitness correlations in the Ecology Letters data based on the better Dovetail v3 genome. The main stuff is on kingspeak here:
/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_hetfit
Here is my summary of results thus far:
1. With the whole genomes we had a very weak but significant excess of het. for survivors on A (h = 0.103 vs. 0.104, so very minor effect). We didn't have an effect on C.
2. When using all SNPs, and doing some pretty serious quality control on the genotypes, I get an almost significant signal on A and on C, in the same direction (lower het is better). This holds as almost significant whether or not you first regress het. on (what is basically) coverage (coverage and het are correlated). The bottom line again though, is that the not quite significant effect is ridiculously small.
Raw het: survivors = 0.0711, dead = 0.0726; residual het: surv = -0.00025; dead = 0.0000876.
3. Nothing new comes from consider just mel or not mel. Dropping mel gives basically the same thing as the full data set. And the signal is basically the same for mel too, just the absolute values of het go up and we are not even close to significance.
4. If I play with this enough I could perhaps pull out some signal somewhere, but it all looks weak and is only flirting with significance. My hope had been that with the better genome we would get a stronger signal, but if anything the signal seems weaker/barely real. Not sure what we want to do about it. If we hadn't already committed to writing something, I would advocate for just dropping this. But now I am not sure.
Here is what we did:
## collect ids of inds. not from the selection experiment
grep ^#C *vcf | perl -p -i -e 's/\s+/\n/g' | grep 2013FHA > dropIds
## filtering with vcftools 0.1.15
vcftools --vcf tcrFhaVars.vcf --out filtered --remove-indels --remove-filtered-all --min-alleles 2 --max-alleles 2 --min-meanDP 6 --max-meanDP 50 --max-missing 0.8 --minQ 30 --remove dropIds --recode --recode-INFO-all
Excluding individuals in 'exclude' list
After filtering, kept 500 out of 1102 Individuals
Outputting VCF file...
After filtering, kept 108182 out of a possible 2062121 Sites
Run Time = 463.00 seconds
## remove LG NA = 0
grep -v LG=0 filtered.recode.vcf > filtered.recode_noNA.vcf
## retained 97,695 SNPs
## get depth information
vcftools --vcf filtered.recode_noNA.vcf --out depth --depth
Outputting Mean Depth by Individual
## convert to gl, nof MAF filter
perl scripts/vcf2gl.pl 0.0 filtered.recode_noNA.vcf
## infer genotype post. probs with popmod, note recompiled to not need the row of 1s in the infile
popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_1.hdf5 -a 0.5 &
popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_2.hdf5 -a 0.5 &
popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_3.hdf5 -a 0.5 &
popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_4.hdf5 -a 0.5 &
popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_5.hdf5 -a 0.5 &
## h5dump to extract genotype post. probs.
## per chain
h5dump --dataset=g ecol_gbs_6x_1.hdf5 > gprob_1.txt
## chose
perl scripts/summGpost.pl gprob_*
finished processing data for 500 samples, 97695 loci and 5 chainsq
## genotype file is ecol_Gest.txt
######################################################
######################################################
Moved to my computer for analyses, see files in /home/zgompert/Local/timema_consGen_hetsurv/.
Model is:
model{
for(i in 1:N){
## binomial/bern sampling dist
y[i] ~ dbinom(p[i],1)
## link function
logit(p[i])<-alpha[grp[i]] + beta * x1[i]
}
## hier. prior
for(k in 1:G){
alpha[k] ~ dnorm(mu,tau)
}
## priors
mu ~ dnorm(0,1e-5)
tau ~ dgamma(0.01,0.001)
beta ~ dnorm(0,1e-5)
}
From commands.R:
## read in genotype data
N<-500
L<-97695
G<-matrix(scan("ecol_Gest.txt",n=N*L,sep=" "),nrow=L,ncol=N,byrow=TRUE)
## calculate het. and minor allele homo
het<-apply(G==1,2,mean,na.rm=TRUE)
mhom<-apply(G==2,2,mean,na.rm=TRUE)
o<-lm(het~nonNa)
rhet<-o$residuals
## read in experimental data
dat<-read.table("experiment_data.txt",header=FALSE)
colnames(dat)<-c("id","blck","host","surv")
grp<-paste(dat[,2],dat[,3],sep="")
## make list
D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(het-mean(het))/sd(het),N=N,G=10)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.35391 -0.22128 -0.15447 -0.08316 0.05158
mean(unlist(out) < 0)
#[1] 0.9295556
### A only
Ad<-which(dat$host=='A')
D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(het[Ad]-mean(het[Ad]))/sd(het[Ad]),N=length(Ad),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.5113 -0.3150 -0.2137 -0.1106 0.1082
mean(unlist(out) < 0)
#[1] 0.9112222
### C only
Ce<-which(dat$host=='C')
D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(het[Ce]-mean(het[Ce]))/sd(het[ACe]),N=length(Ce),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.51052 -0.31277 -0.21092 -0.10688 0.08666
mean(unlist(out) < 0)
#[1] 0.9191111
################ minor allele homo. ################
## make list
D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(mhom-mean(mhom))/sd(mhom),N=N,G=10)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.40678 -0.25136 -0.17709 -0.10302 0.04552
mean(unlist(out) < 0)
#[1] 0.9422222
### A only
Ad<-which(dat$host=='A')
D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(mhom[Ad]-mhom(het[Ad]))/sd(mhom[Ad]),N=length(Ad),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.40433 -0.25506 -0.17831 -0.10155 0.04166
mean(unlist(out) < 0)
#[1] 0.9408889
### C only
Ce<-which(dat$host=='C')
D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(mhom[Ce]-mean(mhom[Ce]))/sd(mhom[ACe]),N=length(Ce),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.40703 -0.25247 -0.17309 -0.09876 0.05216
mean(unlist(out) < 0)
#[1] 0.9332222
###### residual het ##############33
D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet-mean(rhet))/sd(rhet),N=N,G=10)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.32390 -0.18844 -0.11662 -0.05138 0.07742
mean(unlist(out) < 0)
#[1] 0.8821111
### A only
Ad<-which(dat$host=='A')
D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet[Ad]-mean(rhet[Ad]))/sd(rhet[Ad]),N=length(Ad),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.61642 -0.36000 -0.24170 -0.13299 0.05715
mean(unlist(out) < 0)
#[1] 0.9406667
### C only
Ce<-which(dat$host=='C')
D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(rhet[Ce]-mean(rhet[Ce]))/sd(rhet[ACe]),N=length(Ce),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.62660 -0.36176 -0.24314 -0.13448 0.06666
mean(unlist(out) < 0)
#[1] 0.9378889
##################################################################################
################### subset by mel-stripe vs. not #################################
##################################################################################
snps<-read.table("snpData.txt",header=FALSE)
mlb<-4139489
mub<-6414835
mel<-c(which(snps[,2]==702.1 & snps[,3] < mlb),
which(snps[,2]==128 & snps[,3] < mub))
## calculate het. and minor allele homo
het_mel<-apply(G[mel,]==1,2,mean,na.rm=TRUE)
het_nomel<-apply(G[-mel,]==1,2,mean,na.rm=TRUE)
nonNa_mel<-apply(is.na(G[mel,])==FALSE,2,mean)
o<-lm(het_mel~nonNa_mel)
rhet_mel<-o$residuals
nonNa_nomel<-apply(is.na(G[-mel,])==FALSE,2,mean)
o<-lm(het_nomel~nonNa_nomel)
rhet_nomel<-o$residuals
################################# mel
D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet_mel-mean(rhet_mel))/sd(rhet_mel),N=N,G=10)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.18567 -0.05395 0.01390 0.08366 0.22065
mean(unlist(out) < 0)
#[1] 0.446
### A only
Ad<-which(dat$host=='A')
D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet_mel[Ad]-mean(rhet_mel[Ad]))/sd(rhet_mel[Ad]),N=length(Ad),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.282498 -0.102699 -0.001385 0.100268 0.299425
mean(unlist(out) < 0)
#[1] 0.5037778
### C only
Ce<-which(dat$host=='C')
D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(rhet_mel[Ce]-mean(rhet_mel[Ce]))/sd(rhet_mel[Ce]),N=length(Ce),G=5)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.22091 -0.04729 0.04883 0.14713 0.33042
mean(unlist(out) < 0)
#[1] 0.3625556
############################### no mel
D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet_nomel-mean(rhet_nomel))/sd(rhet_nomel),N=N,G=10)
model<-jags.model("model",data=D,n.chains=3)
update(model,n.iter=5000)
out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)
# 2.5% 25% 50% 75% 97.5%
#-0.32807 -0.18868 -0.11975 -0.05187 0.08282
mean(unlist(out) < 0)
#[1] 0.8833333
Ad<-which(dat$host=='A')
D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet_nomel[Ad]-mean(rhet_nomel[Ad]))/sd(rhet_nomel[Ad]),N=length(Ad),G=5)