Post date: Dec 30, 2015 6:28:53 PM
imulations – simulate phenotypes based on specific genetic architectures and evaluate our ability to estimate hyperparameters (PVE and number of SNPs), individual SNP effects, and perhaps to predict fitness (i.e., genomic prediction, with or without cross validation, including cross validation could make this big).
We will base initial simulations on Timema GBS data (just the genomic data, to preserve real patterns of LD and SNPs, but not trait values) so that we will have real patterns of allele frequency variation, LD etc., and consider a quantitative measure of fitness. This data set includes 592 individuals with 246,258 SNPs. Initial contrasts will be:
1. low (5-10%) vs modest (30%) heritability of fitness
2. two different effect size distributions
3. functional variants included vs. excluded from the SNP datasets (the latter case results are assumed to be driven by un-sequenced variants in LD with sequenced ones).
This will then be expanded to consider a binary measure of fitness under a sub-set of these conditions, and simulate fitness using Rhagoletis GBS data for an alternative (much greater) pattern of LD (drawn from Egan et al. Ecology Letters). We will simulate 50 replicate data sets (25 for binary) under each combination of conditions.
Scripts and infiles are in /labs/evolution/projects/selection_sims/. Output will be in /pscratch/A01963476/sims/gemmasel/
The first four sets of simulations (numbered 1-4) are running and are:
1 = Timema, all SNPs, h2 = 0.3, num. functional SNPs = 100, quantitative trait
2 = Timema, all SNPs, h2 = 0.3, num. functional SNPs = 10, quantitative trait
3 = Timema, all SNPs, h2 = 0.05, num. functional SNPs = 100, quantitative trait
4 = Timema, all SNPs, h2 = 0.05,. num. functional SNPs = 10, quantitative trait
Here are example commands for each executed via the wrap_qsub scrip:
cd /pscratch/A01963476/sims/gemmasel
sleep 5
gemma -g /labs/evolution/projects/selection_sims/geno_sims.txt -p /labs/evolution/projects/selection_sims/pheno_sims1.txt -bslmm 1 -n 50 -o out_sim1phbin50rep1 -rpace 100 -maf 0.0 -w 1000000 -s 2000000
cd /pscratch/A01963476/sims/gemmasel
sleep 5
gemma -g /labs/evolution/projects/selection_sims/geno_sims.txt -p /labs/evolution/projects/selection_sims/pheno_sims2.txt -bslmm 1 -n 50 -o out_sim2phbin50rep1 -rpace 100 -maf 0.0 -w 1000000 -s 2000000
cd /pscratch/A01963476/sims/gemmasel
sleep 5
gemma -g /labs/evolution/projects/selection_sims/geno_sims.txt -p /labs/evolution/projects/selection_sims/pheno_sims3.txt -bslmm 1 -n 50 -o out_sim3phbin50rep1 -rpace 100 -maf 0.0 -w 1000000 -s 2000000
cd /pscratch/A01963476/sims/gemmasel
sleep 5
gemma -g /labs/evolution/projects/selection_sims/geno_sims.txt -p /labs/evolution/projects/selection_sims/pheno_sims4.txt -bslmm 1 -n 50 -o out_sim4phbin50rep1 -rpace 100 -maf 0.0 -w 1000000 -s 2000000
Next, I repeated these analyses without including the functional variants. This can be done by supplying a list of SNPs to analyze for each simulation that does not include the functional variants. I generated these lists with the script mkSnpFiles.pl, and the lists are in the snps subdirectory. I than ran everything again with the wrap_qsub script (see one example command below):
cd /pscratch/A01963476/sims/gemmasel
sleep 5
gemma -g /labs/evolution/projects/selection_sims/geno_sims.txt -p /labs/evolution/projects/selection_sims/pheno_sims1.txt -bslmm 1 -n 50 -o out_simnofunc1phbin50rep1 -snps /labs/evolution/projects/selection_sims/snps/snplist_file1_phenotype50.txt -rpace 100 -maf 0.0 -w 1000000 -s 2000000
Here is the original R code for the simulations (sims.R):
nl<-246258
n<-592
G<-matrix(scan("pntestTimemaGbs.txt",n=n*nl,sep=" "),nrow=nl,ncol=n,byrow=TRUE)
Ghat<-round(G,0)
nr<-50
## data set 1, h2 = 0.3, nsnps = 100, normal distribution
nfunc<-100
her<-0.3
funct1m<-matrix(NA,nrow=nfunc,ncol=nr)
effect1m<-matrix(NA,nrow=nfunc,ncol=nr)
bv1m<-matrix(NA,nrow=n,ncol=nr)
ph1m<-matrix(NA,nrow=n,ncol=nr)
her1<-rep(NA,nr)
for(x in 1:nr){
funct1<-sample(1:nl,nfunc,replace=FALSE)
effect1<-rnorm(nfunc,0,1)
bv1<-rep(NA,n)
for(i in 1:n){
bv1[i]<-sum(Ghat[funct1,i] * effect1)
}
v1<--1 * ((her - 1)*var(bv1))/her
ph1<-bv1 + rnorm(n,0,sqrt(v1))
her1[x]<-var(bv1)/var(ph1)
funct1m[,x]<-funct1
effect1m[,x]<-effect1
bv1m[,x]<-bv1
ph1m[,x]<-ph1
}
## data set 2, h2 = 0.3, nsnps = 10, normal distribution
nfunc<-10
her<-0.3
funct2m<-matrix(NA,nrow=nfunc,ncol=nr)
effect2m<-matrix(NA,nrow=nfunc,ncol=nr)
bv2m<-matrix(NA,nrow=n,ncol=nr)
ph2m<-matrix(NA,nrow=n,ncol=nr)
her2<-rep(NA,nr)
for(x in 1:nr){
funct2<-sample(1:nl,nfunc,replace=FALSE)
effect2<-rnorm(nfunc,0,1)
bv2<-rep(NA,n)
for(i in 1:n){
bv2[i]<-sum(Ghat[funct2,i] * effect2)
}
v2<--1 * ((her - 1)*var(bv2))/her
ph2<-bv2 + rnorm(n,0,sqrt(v2))
her2[x]<-var(bv2)/var(ph2)
funct2m[,x]<-funct2
effect2m[,x]<-effect2
bv2m[,x]<-bv2
ph2m[,x]<-ph2
}
## data set 3, h2 = 0.05, nsnps = 100, normal distribution
nfunc<-100
her<-0.05
funct3m<-matrix(NA,nrow=nfunc,ncol=nr)
effect3m<-matrix(NA,nrow=nfunc,ncol=nr)
bv3m<-matrix(NA,nrow=n,ncol=nr)
ph3m<-matrix(NA,nrow=n,ncol=nr)
her3<-rep(NA,nr)
for(x in 1:nr){
funct3<-sample(1:nl,nfunc,replace=FALSE)
effect3<-rnorm(nfunc,0,1)
bv3<-rep(NA,n)
for(i in 1:n){
bv3[i]<-sum(Ghat[funct3,i] * effect3)
}
v3<--1 * ((her - 1)*var(bv3))/her
ph3<-bv3 + rnorm(n,0,sqrt(v3))
her3[x]<-var(bv3)/var(ph3)
funct3m[,x]<-funct3
effect3m[,x]<-effect3
bv3m[,x]<-bv3
ph3m[,x]<-ph3
}
## data set 4, h2 = 0.05, nsnps = 10, normal distribution
nfunc<-10
her<-0.05
funct4m<-matrix(NA,nrow=nfunc,ncol=nr)
effect4m<-matrix(NA,nrow=nfunc,ncol=nr)
bv4m<-matrix(NA,nrow=n,ncol=nr)
ph4m<-matrix(NA,nrow=n,ncol=nr)
her4<-rep(NA,nr)
for(x in 1:nr){
funct4<-sample(1:nl,nfunc,replace=FALSE)
effect4<-rnorm(nfunc,0,1)
bv4<-rep(NA,n)
for(i in 1:n){
bv4[i]<-sum(Ghat[funct4,i] * effect4)
}
v4<--1 * ((her - 1)*var(bv4))/her
ph4<-bv4 + rnorm(n,0,sqrt(v4))
her4[x]<-var(bv4)/var(ph4)
funct4m[,x]<-funct4
effect4m[,x]<-effect4
bv4m[,x]<-bv4
ph4m[,x]<-ph4
}
Gout<-cbind(1:nl,rep("A",nl),rep("T",nl),Ghat)
write.table(Gout,"geno_sims.txt",row.names=F,col.names=F,quote=F)
write.table(ph1m,"pheno_sims1.txt",row.names=F,col.names=F,quote=F)
write.table(ph2m,"pheno_sims2.txt",row.names=F,col.names=F,quote=F)
write.table(ph3m,"pheno_sims3.txt",row.names=F,col.names=F,quote=F)
write.table(ph4m,"pheno_sims4.txt",row.names=F,col.names=F,quote=F)
write.table(bv1m,"bv_sims1.txt",row.names=F,col.names=F,quote=F)
write.table(bv2m,"bv_sims2.txt",row.names=F,col.names=F,quote=F)
write.table(bv3m,"bv_sims3.txt",row.names=F,col.names=F,quote=F)
write.table(bv4m,"bv_sims4.txt",row.names=F,col.names=F,quote=F)
write.table(effect1m,"effect_sims1.txt",row.names=F,col.names=F,quote=F)
write.table(effect2m,"effect_sims2.txt",row.names=F,col.names=F,quote=F)
write.table(effect3m,"effect_sims3.txt",row.names=F,col.names=F,quote=F)
write.table(effect4m,"effect_sims4.txt",row.names=F,col.names=F,quote=F)
write.table(funct1m,"functvar_sims1.txt",row.names=F,col.names=F,quote=F)
write.table(funct2m,"functvar_sims2.txt",row.names=F,col.names=F,quote=F)
write.table(funct3m,"functvar_sims3.txt",row.names=F,col.names=F,quote=F)
write.table(funct4m,"functvar_sims4.txt",row.names=F,col.names=F,quote=F)
save(list=ls(),file="sims.Rdata")