GEBVs and G-matrix for Timema color traits

Post date: Dec 03, 2017 4:14:59 PM

I am estimating genetic covariances (correlations) between GB and RG color traits for the Timema species with green and brown morphs. I am doing this with gemma and in a way that incorporates the polygenic term and SNP effects. The gemma analyses are being run in the latRG and latGB sub-directories within /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/.

Within each folder I ran sbatch gemma_pred.sh.

This has:

#!/bin/sh -f

#SBATCH --time=72:00:00

#SBATCH --nodes=1

#SBATCH --ntasks=10

#SBATCH --account=gompert-kp

#SBATCH --partition=gompert-kp

#SBATCH --job-name=gemma

#SBATCH --mail-type=FAIL

#SBATCH --mail-user=zach.gompert@usu.edu

echo ------------------------------------------------------

echo -n 'Job is running on node '; cat $SLURM_JOB_NODELIST

echo ------------------------------------------------------

echo SLURM: job identifier is $SLURM_JOBID

echo SLURM: job name is $SLURM_JOB_NAME

echo ------------------------------------------------------

module load gcc

module load gsl

module load hdf5

cd /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/mapping/latRG

perl ../scripts/forkRunGemmaPred.pl files_geno files_pheno

files_geno and files_pheno give the appropriate files for genotype and phenotype data for the color trait RG or GB. For RG these are:

files_geno

Tbartmani_JL_gemma_lgNA_excluded.geno

Tcalifornicum_LPLickSM.latRG.bial.noindel.qs20.cov50.mdp41Mdp830.maf0.01.lgNA.excluded.geno

Tchumash_GR8.06.latRG.bial.noindel.qs20.cov50.mdp265Mdp5310.maf0.01.lgNA.excluded.geno

Tcristinae_FHA.latRG.bial.noindel.qs20.cov50.mdp295Mdp5900.maf0.01.lgNA.excluded.geno

Tlandelsensis_BCBOG-BCHC-BCHR-BCOG-BCSUM.latRG.bial.noindel.qs20.cov50.mdp47Mdp940.maf0.01.lgNA.excluded.geno

Tpodura_BSC.latRG.bial.noindel.qs20.cov50.mdp25Mdp420.maf0.01.lgNA.excluded.geno

files_pheno

Tbartmani_JL_gemma_lgNA_excluded_RG.pheno

Tcalifornicum_LPLickSM.latRG.pheno

Tchumash_GR8.06.latRG.pheno

Tcristinae_FHA.latRG.pheno

Tlandelsensis_BCBOG-BCHC-BCHR-BCOG-BCSUM.latRG.pheno

Tpodura_BSC.latRG.pheno

The perl scrip calls gemma with the following options (and does this all with fork), this is an example and I ran 5 chains for each species and trait:

~/bin/gemma-0.94.1 -g Tbartmani_JL_gemma_lgNA_excluded.geno -p Tbartmani_JL_gemma_lgNA_excluded_RG.pheno -bslmm 1 -o o_Tbartmani_0 -maf 0.01 -w 1000000 -s 3000000 -rpace 100

~/bin/gemma-0.94.1 gemma -g Tbartmani_JL_gemma_lgNA_excluded.geno -p NA_Tbartmani_JL_gemma_lgNA_excluded_RG.pheno -epm output/o_Tbartmani_0.param.txt -emu output/o_Tbartmani_0.log.txt -n 1 -o op_Tbartmani_0 -predict 1

Note that the model is fit with all individuals, but that the first individual is not predicted, as one phenotype must be non-NA. This could effect the mean of the GEBVs but shouldn't affect their relative values, which is the only thing relevant for the genetic correlations.

Genetic correlations were then caluculated from the predicted phenotypes. Predictions were averaged over the five chains. This was done with a perl script, executed from the mapping sub-directory

perl scripts/calcGeneticCorrelations.pl lat*/output/op*0.prdt.txt

The script is:

#!/usr/bin/perl

open(R, "> source.R") or die "failed to write the R file\n";

$nsp = 6;

$nch = 5;

print R "bvRG<-vector(\"list\",$nsp)\n";

print R "bvGB<-vector(\"list\",$nsp)\n";

$j=1;

$first=1;

foreach $file (@ARGV){ ## give the op_*0.prdt.txt files, include latRG/GB subdirecotries

$file =~ m/op_(T[a-z]+)/;

$sp = $1;

$file =~ m/lat([A-Z]+)/;

$tr = $1;

if($tr eq 'RG' and $first==1){

$j=1;

$first=0;

}

elsif ($tr eq 'GB'){

push(@sp,$sp);

}

print R "x<-scan(\"$file\")\n";

foreach $i (1..4){

$file =~ s/\d/$i/ or die "failed sub\n";

print R "y<-scan(\"$file\")\nx<-x+y\n";

}

print R "x<-x/$nch\n";

if($tr eq 'GB'){

print R "bvGB[[$j]]<-x\n";

}

elsif($tr eq 'RG'){

print R "bvRG[[$j]]<-x\n";

}

else {

print "trait is $tr, this is an error\n";

}

$j=$j+1;

}

print R "cors<-matrix(NA,nrow=$nsp,ncol=3)\n

for(i in 1:$nsp){\n

o<-cor.test(bvGB[[i]],bvRG[[i]],na.rm=TRUE)\n

cors[i,]<-c(o\$estimate,o\$conf.int)\n

}\n

colnames(cors)<-c(\"r\",\"lb95\",\"ub95\")\n

rownames(cors)<-c(\"Tbartmani\",\"Tcalifornicum\",\"Tchumash\",\"Tcristinae\",\"Tlandelsensis\",\"Tpodura\")\n

write.table(round(cors,4),\"geneticCors.txt\",row.names=TRUE,col.names=TRUE,quote=FALSE)\n";

close(R);

system "R CMD BATCH source.R\n";

$sp = join(" ",@sp);

print "finished for $sp\n";

Here are the correlations:

r lb95 ub95

Tchumash -0.3416 -0.4147 -0.2642

Tbartmani -0.5401 -0.6513 -0.4062

Tpodura -0.6327 -0.7872 -0.4037

Tcristinae -0.5307 -0.5863 -0.4700

Tlandelsensis -0.4493 -0.5983 -0.2704

Tcalifornicum -0.6475 -0.7580 -0.5009

A summary plot was made with plotGenCor.R, which is also in the mapping sub-directory.

Page updated

Google Sites

Report abuse