Post date: May 20, 2019 4:16:4 PM
Here my goal is to map body size to test for selection on the SNPs associated with this trait. I am using the body size measurements from Megan. These only include the newly sequenced samples, not the originals. The measurements (in order) are: 1 = thorax length, 2 = abdomen length, 3 = thorax area, and 4 = abdomen area. The data are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Traits/BodySize/.
1. I am playing with a few approaches here. Most of the bugs are from 2017, but some are from other years:
year = 13 14 15 17
numb = 98 152 49 467
Thus, I am trying mapping with all bugs and just 2017. The latter is perhaps the safer bet as it should minimize the effects of plasticity and still has a reasonable sample size. Thus, I made two phenotype files. Both remove the effect of sex, one is for all individuals and one for the 467 2017 samples. For the latter, I included results just removing sex and those that remove the effects of sex and population (to account for plastic differences among sites). The last is probably the safest bet, but I want to see how much this matters. Note that I will need to generate different input files for the two files (with different numbers of individuals). Here is the code, from commands.R:
## extract residuals after accounting for effect of sex
dat<-read.table("pheno_body_size.txt",header=FALSE)
rdat<-dat
for(i in 5:8){
x<-is.na(dat[,i])==FALSE
o<-lm(dat[,i] ~ dat[,4])
rdat[x,i]<-o$residuals
}
## now just 2017
dat17<-dat[dat$V2==17,]
rdat17<-dat17## control for sex
rrdat17<-dat17## sex and location
for(i in 5:8){
x<-is.na(dat17[,i])==FALSE
o<-lm(dat17[,i] ~ dat17[,4])
rdat17[x,i]<-o$residuals
o<-lm(dat17[,i] ~ dat17[,4]+dat17[,1])
rrdat17[x,i]<-o$residuals
}
out17<-cbind(rdat17,rrdat17[5:8])
write.table(out17,"resid_pheno_body_size17.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(rdat,"resid_pheno_body_size.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
2. Next, I formated the data for input into gemma. From within the Gemma subdirectory:
perl mkGenoFile.pl
WARNING no genetic data for BCR-17-14
WARNING no genetic data for BNP-17-48
WARNING no genetic data for SKI-14-52
WARNING no genetic data for BCR-17-14
WARNING no genetic data for BNP-17-48
I manually removed these individuals with no genotype data from the phenotype files.
At this point, I had 12886 SNPs and 763 or 465 individuals (the latter is for 2017 alone).
Next, in R, I transposed the genotype matrixes
######################
L<-12886
N<-c(763,465)
g<-matrix(scan("sub_G_SAM_sub.txt",n=N[1]*L,sep=","),nrow=N[1],ncol=L,byrow=TRUE)
#Read 9832018 items
G<-t(g)
dim(G)
#[1] 12886 763
write.table(G,"t_sub_G_SAM_sub.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
g<-matrix(scan("sub_G_SAM_sub17.txt",n=N[2]*L,sep=","),nrow=N[2],ncol=L,byrow=TRUE)
#Read 5991990 items
G<-t(g)
dim(G)
#[1] 12886 465
write.table(G,"t_sub_G_SAM_sub17.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
#########################
I then made the final gemma input files.
perl mkGenoFile2.pl
3. Finally, I ran gemma (version 0.95alpha).
I ran, sbatch RunGemma.sh
#!/bin/sh
#SBATCH --time=110:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load gsl
module load gemma
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma
perl RunGemmaFork.pl
which runs (this covers both sets of input files and all of the variables),
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/geno_body_size.txt";
$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/resid_pheno_body_size.txt";
## first fo through full data set
foreach $ch (0..9){
system "sleep 2\n";
foreach $trait (5..8){
$pm->start and next; ## fork
$out = "out_fullset_Trait$trait"."_Ch$ch";
system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
$pm->wait_all_children;
### now 2017 only
$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/geno_body_size17.txt";
$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/resid_pheno_body_size17.txt";
## first fo through full data set
foreach $ch (0..9){
system "sleep 2\n";
foreach $trait (5..12){
$pm->start and next; ## fork
$out = "out_2017_Trait$trait"."_Ch$ch";
system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
$pm->wait_all_children;