Post date: Sep 23, 2019 8:32:34 PM
I am re-mapping color (RG and GB) across all of the individuals and survival within treatment. I also threw in sex within treatment for fun/as a control. Note that some individuals had screwy IDs, so I will want to update/rerun this later (or just focus on the alignment to the melanic genome, which I will start next).
1. Drop low-coverage individuals and match phenotype/genotype data and write subsets. All done in R from the Variants_mel subdirectory.Code is in formatPhenoGeno.R:
## format genotype data and phenotype data for gemma
## read in genotype point estimates
L<-11990
N<-454
g<-matrix(scan("pntest_filtered2x_tchum_mel_gbs_2019.txt",n=N*L,sep=" "),nrow=L,ncol=N,byrow=TRUE)
keep<-scan("../Variants/hiCov2x.txt") ## single column, denotes whether 1 or not 0 mean cov for an individual > 2
## true for 437, going to drop the others
gids<-read.table("../Variants/IdsGen.txt",header=FALSE)
## read phenotype data
phtab<-read.table("../Variants/2019_Tchumash_transplant_table.csv",sep=",",header=TRUE)
ph<-matrix(NA,nrow=N,ncol=8)
ph<-as.data.frame(ph)
for(i in 1:N){
a<-which(as.character(phtab[,1])==as.character(gids[i,1]))
if(length(a)==1){
ph[i,]<-phtab[a,c(1,3:6,11,15:16)]
}
}
colnames(ph)<-colnames(phtab)[c(1,3:6,11,15:16)]
#number codes
# Geneder: M = 2, F = 1
# Color: G = 1, GB = 2, I = 3, M = 4
# Treatment: AC = 1, MM = 2
## subset those that have good genetic data
g_hc<-g[,keep==1]
ph_hc<-ph[keep==1,]
dim(g_hc)
#[1] 11990 437
dim(ph_hc)
#[1] 437 8
## write output for color mapping
## no need to control for gender, explains 1.8% of RG and <1% for GB
write.table(file="pheno_color.txt",ph_hc[,7:8],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum.txt",g_hc,row.names=FALSE,col.names=FALSE,quote=FALSE)
## now by treatment, gender and survival (former is kind of for fun)
AC<-which(ph_hc$Treatment==1)
MM<-which(ph_hc$Treatment==2)
write.table(file="pheno_AC.txt",ph_hc[AC,c(3,6)],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum_AC.txt",g_hc[,AC],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="pheno_MM.txt",ph_hc[MM,c(3,6)],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum_MM.txt",g_hc[,MM],row.names=FALSE,col.names=FALSE,quote=FALSE)
## note, survival is reasonably balanced
apply(ph_hc[MM,c(3,6)],2,mean)
# Gender Survival
#1.4037559 0.1971831
apply(ph_hc[AC,c(3,6)],2,mean)
# Gender Survival
#1.4398148 0.2453704
2. Gemma BSLMM analyses (Version 0.95alpha, 03/04/2016). From here I am in the Gemma_mel subdirectory.
First, I generated the proper genotype input files with mkGenoFile.pl, which makes the mod* versions (not allele IDs are arbitrary). Then I ran the analysis with RunGemma.sh:
#!/bin/sh
#SBATCH --time=96:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load gsl
module load gemma
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/color_exp_2019_gbs/Gemma_mel
perl RunGemmaFork.pl mod_g*
This runs RunGemmaFork.pl (10 chains, color, sex and survival),
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
my $ph;
foreach $g (@ARGV){
if($g =~ m/AC/){
$ph = "pheno_AC.txt";
}
elsif($g =~ m/MM/){
$ph = "pheno_MM.txt";
}
else{
$ph = "pheno_color.txt";
}
$base = $g;
$base =~ s/mod_geno_//;
$base =~ s/\.txt//;
foreach $ch (0..9){
system "sleep 2\n";
foreach $trait (1..2){
$pm->start and next; ## fork
$out = "o_$base"."_ph$trait"."_ch$ch";
system "gemma -g $g -p $ph -bslmm 1 -n $trait -maf 0.0 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
}
$pm->wait_all_children;