Ecol. Letters GBS, dovetail genome, het fitness

Post date: Dec 14, 2017 8:2:42 PM

The goal here was to re-examine het-fitness correlations in the Ecology Letters data based on the better Dovetail v3 genome. The main stuff is on kingspeak here:

/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_hetfit

Here is my summary of results thus far:

1. With the whole genomes we had a very weak but significant excess of het. for survivors on A (h = 0.103 vs. 0.104, so very minor effect). We didn't have an effect on C.

2. When using all SNPs, and doing some pretty serious quality control on the genotypes, I get an almost significant signal on A and on C, in the same direction (lower het is better). This holds as almost significant whether or not you first regress het. on (what is basically) coverage (coverage and het are correlated). The bottom line again though, is that the not quite significant effect is ridiculously small.

Raw het: survivors = 0.0711, dead = 0.0726; residual het: surv = -0.00025; dead = 0.0000876.

3. Nothing new comes from consider just mel or not mel. Dropping mel gives basically the same thing as the full data set. And the signal is basically the same for mel too, just the absolute values of het go up and we are not even close to significance.

4. If I play with this enough I could perhaps pull out some signal somewhere, but it all looks weak and is only flirting with significance. My hope had been that with the better genome we would get a stronger signal, but if anything the signal seems weaker/barely real. Not sure what we want to do about it. If we hadn't already committed to writing something, I would advocate for just dropping this. But now I am not sure.

Here is what we did:

## collect ids of inds. not from the selection experiment

grep ^#C *vcf | perl -p -i -e 's/\s+/\n/g' | grep 2013FHA > dropIds

## filtering with vcftools 0.1.15

vcftools --vcf tcrFhaVars.vcf --out filtered --remove-indels --remove-filtered-all --min-alleles 2 --max-alleles 2 --min-meanDP 6 --max-meanDP 50 --max-missing 0.8 --minQ 30 --remove dropIds --recode --recode-INFO-all

Excluding individuals in 'exclude' list

After filtering, kept 500 out of 1102 Individuals

Outputting VCF file...

After filtering, kept 108182 out of a possible 2062121 Sites

Run Time = 463.00 seconds

## remove LG NA = 0

grep -v LG=0 filtered.recode.vcf > filtered.recode_noNA.vcf

## retained 97,695 SNPs

## get depth information

vcftools --vcf filtered.recode_noNA.vcf --out depth --depth

Outputting Mean Depth by Individual

## convert to gl, nof MAF filter

perl scripts/vcf2gl.pl 0.0 filtered.recode_noNA.vcf

## infer genotype post. probs with popmod, note recompiled to not need the row of 1s in the infile

popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_1.hdf5 -a 0.5 &

popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_2.hdf5 -a 0.5 &

popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_3.hdf5 -a 0.5 &

popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_4.hdf5 -a 0.5 &

popmod -i ecol_gbs_6x.gl -n 1000 -b 500 -t 5 -o ecol_gbs_6x_5.hdf5 -a 0.5 &

## h5dump to extract genotype post. probs.

## per chain

h5dump --dataset=g ecol_gbs_6x_1.hdf5 > gprob_1.txt

## chose

perl scripts/summGpost.pl gprob_*

finished processing data for 500 samples, 97695 loci and 5 chainsq

## genotype file is ecol_Gest.txt

######################################################

Moved to my computer for analyses, see files in /home/zgompert/Local/timema_consGen_hetsurv/.

Model is:

model{

for(i in 1:N){

## binomial/bern sampling dist

y[i] ~ dbinom(p[i],1)

## link function

logit(p[i])<-alpha[grp[i]] + beta * x1[i]

}

## hier. prior

for(k in 1:G){

alpha[k] ~ dnorm(mu,tau)

}

## priors

mu ~ dnorm(0,1e-5)

tau ~ dgamma(0.01,0.001)

beta ~ dnorm(0,1e-5)

}

From commands.R:

## read in genotype data

N<-500

L<-97695

G<-matrix(scan("ecol_Gest.txt",n=N*L,sep=" "),nrow=L,ncol=N,byrow=TRUE)

## calculate het. and minor allele homo

het<-apply(G==1,2,mean,na.rm=TRUE)

mhom<-apply(G==2,2,mean,na.rm=TRUE)

o<-lm(het~nonNa)

rhet<-o$residuals

## read in experimental data

dat<-read.table("experiment_data.txt",header=FALSE)

colnames(dat)<-c("id","blck","host","surv")

grp<-paste(dat[,2],dat[,3],sep="")

## make list

D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(het-mean(het))/sd(het),N=N,G=10)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.35391 -0.22128 -0.15447 -0.08316 0.05158

mean(unlist(out) < 0)

#[1] 0.9295556

### A only

Ad<-which(dat$host=='A')

D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(het[Ad]-mean(het[Ad]))/sd(het[Ad]),N=length(Ad),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.5113 -0.3150 -0.2137 -0.1106 0.1082

mean(unlist(out) < 0)

#[1] 0.9112222

### C only

Ce<-which(dat$host=='C')

D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(het[Ce]-mean(het[Ce]))/sd(het[ACe]),N=length(Ce),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.51052 -0.31277 -0.21092 -0.10688 0.08666

mean(unlist(out) < 0)

#[1] 0.9191111

################ minor allele homo. ################

## make list

D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(mhom-mean(mhom))/sd(mhom),N=N,G=10)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.40678 -0.25136 -0.17709 -0.10302 0.04552

mean(unlist(out) < 0)

#[1] 0.9422222

### A only

Ad<-which(dat$host=='A')

D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(mhom[Ad]-mhom(het[Ad]))/sd(mhom[Ad]),N=length(Ad),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.40433 -0.25506 -0.17831 -0.10155 0.04166

mean(unlist(out) < 0)

#[1] 0.9408889

### C only

Ce<-which(dat$host=='C')

D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(mhom[Ce]-mean(mhom[Ce]))/sd(mhom[ACe]),N=length(Ce),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.40703 -0.25247 -0.17309 -0.09876 0.05216

mean(unlist(out) < 0)

#[1] 0.9332222

###### residual het ##############33

D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet-mean(rhet))/sd(rhet),N=N,G=10)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.32390 -0.18844 -0.11662 -0.05138 0.07742

mean(unlist(out) < 0)

#[1] 0.8821111

### A only

Ad<-which(dat$host=='A')

D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet[Ad]-mean(rhet[Ad]))/sd(rhet[Ad]),N=length(Ad),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.61642 -0.36000 -0.24170 -0.13299 0.05715

mean(unlist(out) < 0)

#[1] 0.9406667

### C only

Ce<-which(dat$host=='C')

D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(rhet[Ce]-mean(rhet[Ce]))/sd(rhet[ACe]),N=length(Ce),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.62660 -0.36176 -0.24314 -0.13448 0.06666

mean(unlist(out) < 0)

#[1] 0.9378889

##################################################################################

################### subset by mel-stripe vs. not #################################

##################################################################################

snps<-read.table("snpData.txt",header=FALSE)

mlb<-4139489

mub<-6414835

mel<-c(which(snps[,2]==702.1 & snps[,3] < mlb),

which(snps[,2]==128 & snps[,3] < mub))

## calculate het. and minor allele homo

het_mel<-apply(G[mel,]==1,2,mean,na.rm=TRUE)

het_nomel<-apply(G[-mel,]==1,2,mean,na.rm=TRUE)

nonNa_mel<-apply(is.na(G[mel,])==FALSE,2,mean)

o<-lm(het_mel~nonNa_mel)

rhet_mel<-o$residuals

nonNa_nomel<-apply(is.na(G[-mel,])==FALSE,2,mean)

o<-lm(het_nomel~nonNa_nomel)

rhet_nomel<-o$residuals

################################# mel

D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet_mel-mean(rhet_mel))/sd(rhet_mel),N=N,G=10)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.18567 -0.05395 0.01390 0.08366 0.22065

mean(unlist(out) < 0)

#[1] 0.446

### A only

Ad<-which(dat$host=='A')

D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet_mel[Ad]-mean(rhet_mel[Ad]))/sd(rhet_mel[Ad]),N=length(Ad),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.282498 -0.102699 -0.001385 0.100268 0.299425

mean(unlist(out) < 0)

#[1] 0.5037778

### C only

Ce<-which(dat$host=='C')

D<-list(y=dat$surv[Ce],grp=as.numeric(as.factor(grp[Ce])),x1=(rhet_mel[Ce]-mean(rhet_mel[Ce]))/sd(rhet_mel[Ce]),N=length(Ce),G=5)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.22091 -0.04729 0.04883 0.14713 0.33042

mean(unlist(out) < 0)

#[1] 0.3625556

############################### no mel

D<-list(y=dat$surv,grp=as.numeric(as.factor(grp)),x1=(rhet_nomel-mean(rhet_nomel))/sd(rhet_nomel),N=N,G=10)

model<-jags.model("model",data=D,n.chains=3)

update(model,n.iter=5000)

out<-coda.samples(model=model,variable.names=c("beta"),n.iter=9000,thin=3)

# 2.5% 25% 50% 75% 97.5%

#-0.32807 -0.18868 -0.11975 -0.05187 0.08282

mean(unlist(out) < 0)

#[1] 0.8833333

Ad<-which(dat$host=='A')

D<-list(y=dat$surv[Ad],grp=as.numeric(as.factor(grp[Ad])),x1=(rhet_nomel[Ad]-mean(rhet_nomel[Ad]))/sd(rhet_nomel[Ad]),N=length(Ad),G=5)

Page updated

Google Sites

Report abuse