Post date: Jun 26, 2018 3:34:30 PM
The measurement data came from Megan with the original files on box. These only include the newly sequenced samples, not the originals. The measurements (in order) are: 1 = thorax length, 2 = abdomen length, 3 = thorax area, and 4 = abdomen area. The data are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Traits/BodySize/.
I am going to map with gemma. Here are the steps.
1. Process the data by combining the text/xls files from each individual.
perl ExtratMes.pl *xls
This generates pheno_body_size.txt (which I further modified to pull out year and populations as distinct columns).
2. Remove the effect of sex on each measurement with linear regression in R. This is done with commands.R and makes resid_pheno_body_size.txt, which should be ready for gemma.
## extract residuals after accounting for effect of sex
dat<-read.table("pheno_body_size.txt",header=FALSE)
rdat<-dat
for(i in 5:8){
x<-is.na(dat[,i])==FALSE
o<-lm(dat[,i] ~ dat[,4])
rdat[x,i]<-o$residuals
}
write.table(rdat,"resid_pheno_body_size.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
3. Generate the geno formatted genotype infile.
First, extract the genotype estimates from entropy.
estpost_entropy -o g_k2-3.txt -p gprob -s 0 -w 0 out_k2_ch1.hdf5 out_k2_ch2.hdf5 out_k3_ch1.hdf5 out_k3_ch2.hdf5
file = out_k2_ch1.hdf5
file = out_k2_ch2.hdf5
file = out_k3_ch1.hdf5
file = out_k3_ch2.hdf5
parameter dimensions for gprob: loci = 30513, ind = 1536, genotypes = 3, chains = 4
Then sort the genotypes and subset to those that we have phenotypes for.
perl mkGenoFile.pl
WARNING no genetic data for BCR-17-14
WARNING no genetic data for BNP-17-48
WARNING no genetic data for SKI-14-52
Manually removed these from resid_pheno_body_size.txt creating the new version sub_resid_pheno_body_size.txt
We now have 763 individuals, 30,513 SNPs and four traits in sub_g_k2-3.txt and sub_resid_pheno_body_size.txt.
#### in R #########
L<-30513
N<-763
g<-matrix(scan("sub_g_k2-3.txt",n=N*L,sep=","),nrow=N,ncol=L,byrow=TRUE)
dim(G)
write.table(G,"t_sub_g_k2-3.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
###################
Next I generated the propoerly formatted geno file geno_body_size.txt with
perl mkGenoFile2.pl
4. Ran the BSLMM in gemma (version 0.95alpha), 10 chains and four traits.
sbatch RunGemma.sh
#!/bin/sh
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load gsl
module load gemma
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/
perl RunGemmaFork.pl
early()
{
echo ' '
echo ' ############ WARNING: EARLY TERMINATION ############# '
echo ' '
}
exit
Which runs
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/geno_body_size.txt";
$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/sub_resid_pheno_body_size.txt";
foreach $ch (0..9){
system "sleep 2\n";
foreach $trait (5..8){
$pm->start and next; ## fork
$out = "out_Trait$trait"."_Ch$ch";
system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
$pm->wait_all_children;