Post date: Dec 03, 2017 4:14:59 PM
I am estimating genetic covariances (correlations) between GB and RG color traits for the Timema species with green and brown morphs. I am doing this with gemma and in a way that incorporates the polygenic term and SNP effects. The gemma analyses are being run in the latRG and latGB sub-directories within /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/.
Within each folder I ran sbatch gemma_pred.sh.
This has:
#!/bin/sh -f
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $SLURM_JOB_NODELIST
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load gcc
module load gsl
module load hdf5
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/latRG
perl ../scripts/forkRunGemmaPred.pl files_geno files_pheno
files_geno and files_pheno give the appropriate files for genotype and phenotype data for the color trait RG or GB. For RG these are:
files_geno
Tbartmani_JL_gemma_lgNA_excluded.geno
Tcalifornicum_LPLickSM.latRG.bial.noindel.qs20.cov50.mdp41Mdp830.maf0.01.lgNA.excluded.geno
Tchumash_GR8.06.latRG.bial.noindel.qs20.cov50.mdp265Mdp5310.maf0.01.lgNA.excluded.geno
Tcristinae_FHA.latRG.bial.noindel.qs20.cov50.mdp295Mdp5900.maf0.01.lgNA.excluded.geno
Tlandelsensis_BCBOG-BCHC-BCHR-BCOG-BCSUM.latRG.bial.noindel.qs20.cov50.mdp47Mdp940.maf0.01.lgNA.excluded.geno
Tpodura_BSC.latRG.bial.noindel.qs20.cov50.mdp25Mdp420.maf0.01.lgNA.excluded.geno
files_pheno
Tbartmani_JL_gemma_lgNA_excluded_RG.pheno
Tcalifornicum_LPLickSM.latRG.pheno
Tchumash_GR8.06.latRG.pheno
Tcristinae_FHA.latRG.pheno
Tlandelsensis_BCBOG-BCHC-BCHR-BCOG-BCSUM.latRG.pheno
Tpodura_BSC.latRG.pheno
The perl scrip calls gemma with the following options (and does this all with fork), this is an example and I ran 5 chains for each species and trait:
~/bin/gemma-0.94.1 -g Tbartmani_JL_gemma_lgNA_excluded.geno -p Tbartmani_JL_gemma_lgNA_excluded_RG.pheno -bslmm 1 -o o_Tbartmani_0 -maf 0.01 -w 1000000 -s 3000000 -rpace 100
~/bin/gemma-0.94.1 gemma -g Tbartmani_JL_gemma_lgNA_excluded.geno -p NA_Tbartmani_JL_gemma_lgNA_excluded_RG.pheno -epm output/o_Tbartmani_0.param.txt -emu output/o_Tbartmani_0.log.txt -n 1 -o op_Tbartmani_0 -predict 1
Note that the model is fit with all individuals, but that the first individual is not predicted, as one phenotype must be non-NA. This could effect the mean of the GEBVs but shouldn't affect their relative values, which is the only thing relevant for the genetic correlations.
Genetic correlations were then caluculated from the predicted phenotypes. Predictions were averaged over the five chains. This was done with a perl script, executed from the mapping sub-directory
perl scripts/calcGeneticCorrelations.pl lat*/output/op*0.prdt.txt
The script is:
#!/usr/bin/perl
open(R, "> source.R") or die "failed to write the R file\n";
$nsp = 6;
$nch = 5;
print R "bvRG<-vector(\"list\",$nsp)\n";
print R "bvGB<-vector(\"list\",$nsp)\n";
$j=1;
$first=1;
foreach $file (@ARGV){ ## give the op_*0.prdt.txt files, include latRG/GB subdirecotries
$file =~ m/op_(T[a-z]+)/;
$sp = $1;
$file =~ m/lat([A-Z]+)/;
$tr = $1;
if($tr eq 'RG' and $first==1){
$j=1;
$first=0;
}
elsif ($tr eq 'GB'){
push(@sp,$sp);
}
print R "x<-scan(\"$file\")\n";
foreach $i (1..4){
$file =~ s/\d/$i/ or die "failed sub\n";
print R "y<-scan(\"$file\")\nx<-x+y\n";
}
print R "x<-x/$nch\n";
if($tr eq 'GB'){
print R "bvGB[[$j]]<-x\n";
}
elsif($tr eq 'RG'){
print R "bvRG[[$j]]<-x\n";
}
else {
print "trait is $tr, this is an error\n";
}
$j=$j+1;
}
print R "cors<-matrix(NA,nrow=$nsp,ncol=3)\n
for(i in 1:$nsp){\n
o<-cor.test(bvGB[[i]],bvRG[[i]],na.rm=TRUE)\n
cors[i,]<-c(o\$estimate,o\$conf.int)\n
}\n
colnames(cors)<-c(\"r\",\"lb95\",\"ub95\")\n
rownames(cors)<-c(\"Tbartmani\",\"Tcalifornicum\",\"Tchumash\",\"Tcristinae\",\"Tlandelsensis\",\"Tpodura\")\n
write.table(round(cors,4),\"geneticCors.txt\",row.names=TRUE,col.names=TRUE,quote=FALSE)\n";
close(R);
system "R CMD BATCH source.R\n";
$sp = join(" ",@sp);
print "finished for $sp\n";
Here are the correlations:
r lb95 ub95
Tchumash -0.3416 -0.4147 -0.2642
Tbartmani -0.5401 -0.6513 -0.4062
Tpodura -0.6327 -0.7872 -0.4037
Tcristinae -0.5307 -0.5863 -0.4700
Tlandelsensis -0.4493 -0.5983 -0.2704
Tcalifornicum -0.6475 -0.7580 -0.5009
A summary plot was made with plotGenCor.R, which is also in the mapping sub-directory.