Multilocus mapping of CHC in Lycaeides, 2018 data

Post date: Jul 01, 2018 9:9:18 PM

I took a first pass at mapping CHC variation in the experimental caterpillars we reared in spring 2018. All of the commands below were executed from /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/.

1. Format the genotype file, which uses genotype estimates from Entropy with k=2,3.

perl mkGenoFile.pl

2. Create the phenotype file. My initial work on the CHC data, including commands to generate the specific data used for mapping is in ../ChcData/commans2018.R. Note that I collapsed CHCs down to main chain length and amount of branching. Also, I have data as relative abundances and residual absolute abundance controlling for weight (little caterpillars appeared to have much more, per weight, total CHCs). Here is commans2018.R.

chc17<-read.csv("CHC-2017.csv")

chc18<-read.csv("CHC-2018.csv")

dat<-rbind(chc17[,-c(1:2)],chc18[,-c(1:2)])

## replace single NA with 0?

dat$X44.95[41]<-0

## get compund names

chcs<-read.csv("CompoundData.csv",header=FALSE)

## get rid of unknown compounds

dat<-dat[,-c(1:6)]

chcs<-chcs[-c(1:6),]

colnames(chcs)<-c("compound","C","branch","totalC")

X<-dim(dat)[1]

## colapse by main chain carbon

L<-length(unique(chcs$C))

uchcs<-unique(chcs$C)

datC<-matrix(NA,nrow=X,ncol=L)

for(i in 1:L){

a<-which(chcs$C==uchcs[i])

if(length(a) > 1){

datC[,i]<-apply(dat[,a],1,sum)

}

else{

datC[,i]<-dat[,a]

}

## colapse by no. branches

L<-length(unique(chcs$branch))

uchcs<-unique(chcs$branch)

datB<-matrix(NA,nrow=X,ncol=L)

for(i in 1:L){

a<-which(chcs$branch==uchcs[i])

if(length(a) > 1){

datB[,i]<-apply(dat[,a],1,sum)

}

else{

datB[,i]<-dat[,a]

}

## read weight data

wgt<-read.table("formattedWeights2018.txt",header=F)

## relative

rdatB<-datB

rdatC<-datC

for(i in 1:X){

rdatB[i,]<-rdatB[i,]/sum(rdatB[i,])

rdatC[i,]<-rdatC[i,]/sum(rdatC[i,])

}

## remove effect of weight

wdatB<-datB[-c(1:92),]

L<-dim(wdatB)[2]

for(i in 1:L){

o<-lm(datB[-c(1:92),i] ~ wgt[,2])

wdatB[,i]<-o$residuals

}

wdatC<-datC[-c(1:92),]

L<-dim(wdatC)[2]

for(i in 1:L){

o<-lm(datC[-c(1:92),i] ~ wgt[,2])

wdatC[,i]<-o$residuals

}

## write outfiles for mapping

outAll<-cbind(rdatC,rdatB)

outWgt<-cbind(wdatC,wdatB)

write.table(outAll,"pheno_chc.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

write.table(outWgt,"pheno_chc_wgt.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

## covariate

yr<-c(rep(17,92),rep(18,192))

pop<-c(as.character(chc17[,1]),as.character(chc18[,1]))

pop[187]<-'HNV'

## fitted dat

fdat<-dat

N<-dim(dat)[2]

for(i in 1:N){

o<-lm(dat[,i] ~ yr + pop)

fdat[,i]<-o$fitted.values

}

Next I got the list of individuals with (good) genetic data.

head -n 2 Entropy/lyc_chc_1.gl | tail -n 1 | perl -p -i -e 's/\s+/\n/g' > GenoIds.txt

This list was then used to grab phenotype data for the subset of individuals with genetic data and to sort the phenotypic data.

perl SubSetPheno.pl

3. I then ran gemma on 11 traits (relative CHCs by chain length and branchedness) with five chains each.

sbatch RunGemma.sh

#!/bin/sh

#SBATCH --time=72:00:00

#SBATCH --nodes=1

#SBATCH --ntasks=8

#SBATCH --account=gompert-kp

#SBATCH --partition=gompert-kp

#SBATCH --job-name=gemma

#SBATCH --mail-type=FAIL

#SBATCH --mail-user=zach.gompert@usu.edu

module load gsl

module load gemma

cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/

perl RunGemmaFork.pl

early()

{

echo ' '

echo ' ############ WARNING: EARLY TERMINATION ############# '

echo ' '

}

exit

Which runs,

#!/usr/bin/perl

# run gemma jobs

use Parallel::ForkManager;

my $max = 20;

my $pm = Parallel::ForkManager->new($max);

$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/geno_chcs_2018.txt";

$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_chcs/Gemma/sub_pheno_chc.txt";

#JOB:

foreach $ch (0..4){

system "sleep 2\n";

foreach $trait (1..11){

$pm->start and next; ## fork

#$pm->start and next JOB; ## fork

$out = "out_Trait$trait"."_Ch$ch";

system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";

$pm->finish;

}

$pm->wait_all_children;

Page updated

Google Sites

Report abuse