Simulated wing pattern data

Post date: Sep 17, 2017 3:29:14 AM

I simulated wing pattern (trait) data to see how the BSLMM in gemma performs with different degrees of structure.

1. Simulations

I am using the actual Lycaeides genotype data as the basis for phenotype simulations. 10 or 100 causal SNPs are chosen from the actual set of SNPs, and I am working with four genetic data sets (all bugs, anna, melissa, and SIN). Heritabilities are all 0.5 and their are 50 simulations per set of conditions. The relevant in/outfiles are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/infiles/.

Here is the simulation code from simPheno.R:

## all simulations have a heritability of 0.5

## with 10 or 100 causal variants (effect draft from standard normal)

## phenotypes than normalized to have mean 0 and sd 1, before adding second random deviate from N(0,1) (this yields heritability of 0.5)

## causal variants are in the data set

## use: all data, anna, melissa and SIN

## 50 reps each = 50 * 2 * 4 = 400 data sets

simp<-function(Gmat=NA,h2=0.5,nsnps=NA,nreps=50){

N<-dim(Gmat)[2];L<-dim(Gmat)[1]

phm<-matrix(NA,nrow=N,ncol=nreps)

snps<-matrix(NA,nrow=nsnps,ncol=nreps) ## numeric id of causal snps

effects<-snps ## effects of snps

for(i in 1:nreps){

snps[,i]<-sample(1:L,nsnps,replace=FALSE)

effects[,i]<-rnorm(nsnps,mean=0,sd=1)

for(j in 1:N){

phm[j,i]<-sum(Gmat[snps[,i],j] * effects[,i])

}

phm[,i]<-(phm[,i]-mean(phm[,i]))/sd(phm[,i]) + rnorm(N,mean=0,sd=1)

}

simo<-list(phm,snps,effects)

return(simo)

}

writeRes<-function(o,prefix=NA){

write.table(o[[1]],paste("ph_",prefix,".txt",sep=""),row.names=FALSE,col.names=FALSE,sep=" ",quote=FALSE)

write.table(o[[2]],paste("snp_",prefix,".txt",sep=""),row.names=FALSE,col.names=FALSE,sep=" ",quote=FALSE)

write.table(o[[3]],paste("effect_",prefix,".txt",sep=""),row.names=FALSE,col.names=FALSE,sep=" ",quote=FALSE)

}

## all, 10 and 100 snps

G<-read.table("mod_geno_size.txt",sep=", ",header=FALSE)

o_all_10<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=10,nreps=50)

o_all_100<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=100,nreps=50)

writeRes(o_all_10,"sim_all_10")

writeRes(o_all_100,"sim_all_100")

## Anna, 10 and 100 snps

G<-read.table("AN_mod_geno_size.txt",sep=",",header=FALSE)

o_AN_10<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=10,nreps=50)

o_AN_100<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=100,nreps=50)

writeRes(o_all_10,"sim_AN_10")

writeRes(o_all_100,"sim_AN_100")

## Melissa, 10 and 100 snps

G<-read.table("ME_mod_geno_size.txt",sep=",",header=FALSE)

o_ME_10<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=10,nreps=50)

o_ME_100<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=100,nreps=50)

writeRes(o_all_10,"sim_ME_10")

writeRes(o_all_100,"sim_ME_100")

## SIN, 10 and 100 snps

G<-read.table("SIN_mod_geno_size.txt",sep=",",header=FALSE)

o_SIN_10<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=10,nreps=50)

o_SIN_100<-simp(Gmat=as.matrix(G[,-c(1:3)]),nsnps=100,nreps=50)

writeRes(o_all_10,"sim_SIN_10")

writeRes(o_all_100,"sim_SIN_100")

2. Analysis

I ran gemma with perl ../scripts/wrap_qsub_slurm_gemma_sims.pl

This actually gets the list of files to analyze from sim_files.txt and then runs a fork wrapper over the 50 traits for each set of simulation conditions (different chains, of which there are 5, are different jobs), e.g.,