Post date: Jul 01, 2018 9:9:18 PM
I took a first pass at mapping CHC variation in the experimental caterpillars we reared in spring 2018. All of the commands below were executed from /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/.
1. Format the genotype file, which uses genotype estimates from Entropy with k=2,3.
perl mkGenoFile.pl
2. Create the phenotype file. My initial work on the CHC data, including commands to generate the specific data used for mapping is in ../ChcData/commans2018.R. Note that I collapsed CHCs down to main chain length and amount of branching. Also, I have data as relative abundances and residual absolute abundance controlling for weight (little caterpillars appeared to have much more, per weight, total CHCs). Here is commans2018.R.
chc17<-read.csv("CHC-2017.csv")
chc18<-read.csv("CHC-2018.csv")
dat<-rbind(chc17[,-c(1:2)],chc18[,-c(1:2)])
## replace single NA with 0?
dat$X44.95[41]<-0
## get compund names
chcs<-read.csv("CompoundData.csv",header=FALSE)
## get rid of unknown compounds
dat<-dat[,-c(1:6)]
chcs<-chcs[-c(1:6),]
colnames(chcs)<-c("compound","C","branch","totalC")
X<-dim(dat)[1]
## colapse by main chain carbon
L<-length(unique(chcs$C))
uchcs<-unique(chcs$C)
datC<-matrix(NA,nrow=X,ncol=L)
for(i in 1:L){
a<-which(chcs$C==uchcs[i])
if(length(a) > 1){
datC[,i]<-apply(dat[,a],1,sum)
}
else{
datC[,i]<-dat[,a]
}
}
## colapse by no. branches
L<-length(unique(chcs$branch))
uchcs<-unique(chcs$branch)
datB<-matrix(NA,nrow=X,ncol=L)
for(i in 1:L){
a<-which(chcs$branch==uchcs[i])
if(length(a) > 1){
datB[,i]<-apply(dat[,a],1,sum)
}
else{
datB[,i]<-dat[,a]
}
}
## read weight data
wgt<-read.table("formattedWeights2018.txt",header=F)
## relative
rdatB<-datB
rdatC<-datC
for(i in 1:X){
rdatB[i,]<-rdatB[i,]/sum(rdatB[i,])
rdatC[i,]<-rdatC[i,]/sum(rdatC[i,])
}
## remove effect of weight
wdatB<-datB[-c(1:92),]
L<-dim(wdatB)[2]
for(i in 1:L){
o<-lm(datB[-c(1:92),i] ~ wgt[,2])
wdatB[,i]<-o$residuals
}
wdatC<-datC[-c(1:92),]
L<-dim(wdatC)[2]
for(i in 1:L){
o<-lm(datC[-c(1:92),i] ~ wgt[,2])
wdatC[,i]<-o$residuals
}
## write outfiles for mapping
outAll<-cbind(rdatC,rdatB)
outWgt<-cbind(wdatC,wdatB)
write.table(outAll,"pheno_chc.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(outWgt,"pheno_chc_wgt.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
## covariate
yr<-c(rep(17,92),rep(18,192))
pop<-c(as.character(chc17[,1]),as.character(chc18[,1]))
pop[187]<-'HNV'
## fitted dat
fdat<-dat
N<-dim(dat)[2]
for(i in 1:N){
o<-lm(dat[,i] ~ yr + pop)
fdat[,i]<-o$fitted.values
}
Next I got the list of individuals with (good) genetic data.
head -n 2 Entropy/lyc_chc_1.gl | tail -n 1 | perl -p -i -e 's/\s+/\n/g' > GenoIds.txt
This list was then used to grab phenotype data for the subset of individuals with genetic data and to sort the phenotypic data.
perl SubSetPheno.pl
3. I then ran gemma on 11 traits (relative CHCs by chain length and branchedness) with five chains each.
sbatch RunGemma.sh
#!/bin/sh
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load gsl
module load gemma
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/
perl RunGemmaFork.pl
early()
{
echo ' '
echo ' ############ WARNING: EARLY TERMINATION ############# '
echo ' '
}
exit
Which runs,
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/geno_chcs_2018.txt";
$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/sub_pheno_chc.txt";
#JOB:
foreach $ch (0..4){
system "sleep 2\n";
foreach $trait (1..11){
$pm->start and next; ## fork
#$pm->start and next JOB; ## fork
$out = "out_Trait$trait"."_Ch$ch";
system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
$pm->wait_all_children;