Prepping and mapping with body size data

Post date: Jun 26, 2018 3:34:30 PM

The measurement data came from Megan with the original files on box. These only include the newly sequenced samples, not the originals. The measurements (in order) are: 1 = thorax length, 2 = abdomen length, 3 = thorax area, and 4 = abdomen area. The data are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Traits/BodySize/.

I am going to map with gemma. Here are the steps.

1. Process the data by combining the text/xls files from each individual.

perl ExtratMes.pl *xls

This generates pheno_body_size.txt (which I further modified to pull out year and populations as distinct columns).

2. Remove the effect of sex on each measurement with linear regression in R. This is done with commands.R and makes resid_pheno_body_size.txt, which should be ready for gemma.

## extract residuals after accounting for effect of sex

dat<-read.table("pheno_body_size.txt",header=FALSE)

rdat<-dat

for(i in 5:8){

x<-is.na(dat[,i])==FALSE

o<-lm(dat[,i] ~ dat[,4])

rdat[x,i]<-o$residuals

}

write.table(rdat,"resid_pheno_body_size.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

3. Generate the geno formatted genotype infile.

First, extract the genotype estimates from entropy.

estpost_entropy -o g_k2-3.txt -p gprob -s 0 -w 0 out_k2_ch1.hdf5 out_k2_ch2.hdf5 out_k3_ch1.hdf5 out_k3_ch2.hdf5

file = out_k2_ch1.hdf5

file = out_k2_ch2.hdf5

file = out_k3_ch1.hdf5

file = out_k3_ch2.hdf5

parameter dimensions for gprob: loci = 30513, ind = 1536, genotypes = 3, chains = 4

Then sort the genotypes and subset to those that we have phenotypes for.

perl mkGenoFile.pl

WARNING no genetic data for BCR-17-14

WARNING no genetic data for BNP-17-48

WARNING no genetic data for SKI-14-52

Manually removed these from resid_pheno_body_size.txt creating the new version sub_resid_pheno_body_size.txt

We now have 763 individuals, 30,513 SNPs and four traits in sub_g_k2-3.txt and sub_resid_pheno_body_size.txt.

#### in R #########

L<-30513

N<-763

g<-matrix(scan("sub_g_k2-3.txt",n=N*L,sep=","),nrow=N,ncol=L,byrow=TRUE)

dim(G)

write.table(G,"t_sub_g_k2-3.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

###################

Next I generated the propoerly formatted geno file geno_body_size.txt with

perl mkGenoFile2.pl

4. Ran the BSLMM in gemma (version 0.95alpha), 10 chains and four traits.

sbatch RunGemma.sh

#!/bin/sh

#SBATCH --time=72:00:00

#SBATCH --nodes=1

#SBATCH --ntasks=8

#SBATCH --account=gompert-kp

#SBATCH --partition=gompert-kp

#SBATCH --job-name=gemma

#SBATCH --mail-type=FAIL

#SBATCH --mail-user=zach.gompert@usu.edu

module load gsl

module load gemma

cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/

perl RunGemmaFork.pl

early()

{

echo ' '

echo ' ############ WARNING: EARLY TERMINATION ############# '

echo ' '

}

exit

Which runs

#!/usr/bin/perl

# run gemma jobs

use Parallel::ForkManager;

my $max = 20;

my $pm = Parallel::ForkManager->new($max);

$gfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/geno_body_size.txt";

$pfile = "/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Gemma/sub_resid_pheno_body_size.txt";

foreach $ch (0..9){

system "sleep 2\n";

foreach $trait (5..8){

$pm->start and next; ## fork

$out = "out_Trait$trait"."_Ch$ch";

system "gemma -g $gfile -p $pfile -bslmm 1 -n $trait -maf 0.001 -w 200000 -s 1000000 -o $out\n";

$pm->finish;

}

$pm->wait_all_children;

Page updated

Google Sites

Report abuse