Post date: Sep 18, 2019 6:23:30 PM
Note: this is based on the alignment to the stripe genome. This is kind of problematic as this has the deletion. But it should still be somewhat informative because of LD with the Mel-Stripe region.
I am re-mapping color (RG and GB) across all of the individuals and survival within treatment. I also threw in sex within treatment for fun/as a control. Note that some individuals had screwy IDs, so I will want to update/rerun this later (or just focus on the alignment to the melanic genome, which I will start next).
1. Drop low-coverage individuals and match phenotype/genotype data and write subsets. All done in R from the Variants subdirectory.Code is in formatPhenoGeno.R:
## format genotype data and phenotype data for gemma
## read in genotype point estimates
L<-11730
N<-454
g<-matrix(scan("pntest_filtered2x_tchum_gbs_2019.txt",n=N*L,sep=" "),nrow=L,ncol=N,byrow=TRUE)
keep<-scan("hiCov2x.txt") ## single column, denotes whether 1 or not 0 mean cov for an individual > 2
## true for 437, going to drop the others
gids<-read.table("IdsGen.txt",header=FALSE)
## read phenotype data
phtab<-read.table("2019_Tchumash_transplant_table.csv",sep=",",header=TRUE)
ph<-matrix(NA,nrow=N,ncol=8)
ph<-as.data.frame(ph)
for(i in 1:N){
a<-which(as.character(phtab[,1])==as.character(gids[i,1]))
if(length(a)==1){
ph[i,]<-phtab[a,c(1,3:6,11,15:16)]
}
}
colnames(ph)<-colnames(phtab)[c(1,3:6,11,15:16)]
#number codes
# Geneder: M = 2, F = 1
# Color: G = 1, GB = 2, I = 3, M = 4
# Treatment: AC = 1, MM = 2
## subset those that have good genetic data
g_hc<-g[,keep==1]
ph_hc<-ph[keep==1,]
dim(g_hc)
#[1] 11730 437
dim(ph_hc)
#[1] 437 8
## write output for color mapping
## no need to control for gender, explains 1.8% of RG and <1% for GB
write.table(file="pheno_color.txt",ph_hc[,7:8],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum.txt",g_hc,row.names=FALSE,col.names=FALSE,quote=FALSE)
## now by treatment, gender and survival (former is kind of for fun)
AC<-which(ph_hc$Treatment==1)
MM<-which(ph_hc$Treatment==2)
write.table(file="pheno_AC.txt",ph_hc[AC,c(3,6)],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum_AC.txt",g_hc[,AC],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="pheno_MM.txt",ph_hc[MM,c(3,6)],row.names=FALSE,col.names=FALSE,quote=FALSE)
write.table(file="g_tchum_MM.txt",g_hc[,MM],row.names=FALSE,col.names=FALSE,quote=FALSE)
## note, survival is reasonably balanced
apply(ph_hc[MM,c(3,6)],2,mean)
# Gender Survival
#1.4037559 0.1971831
apply(ph_hc[AC,c(3,6)],2,mean)
# Gender Survival
#1.4398148 0.2453704
2. Gemma BSLMM analyses (Version 0.95alpha, 03/04/2016). From here I am in the Gemma subdirectory.
First, I generated the proper genotype input files with mkGenoFile.pl, which makes the mod* versions (not allele IDs are arbitrary). Then I ran the analysis with RunGemma.sh:
#!/bin/sh
#SBATCH --time=96:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=gemma
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load gsl
module load gemma
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_genarch/color_exp_2019_gbs/Gemma
perl RunGemmaFork.pl mod_g*
This runs RunGemmaFork.pl (10 chains, color, sex and survival),
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
my $ph;
foreach $g (@ARGV){
if($g =~ m/AC/){
$ph = "pheno_AC.txt";
}
elsif($g =~ m/MM/){
$ph = "pheno_MM.txt";
}
else{
$ph = "pheno_color.txt";
}
$base = $g;
$base =~ s/mod_geno_//;
$base =~ s/\.txt//;
foreach $ch (0..9){
system "sleep 2\n";
foreach $trait (1..2){
$pm->start and next; ## fork
$out = "o_$base"."_ph$trait"."_ch$ch";
system "gemma -g $g -p $ph -bslmm 1 -n $trait -maf 0.0 -w 200000 -s 1000000 -o $out\n";
$pm->finish;
}
}
}
$pm->wait_all_children;