Post date: Aug 21, 2019 9:42:31 PM
Some summaries from my Aug. 21st analyses.
1. MCMC output from gemma is in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_confiers/redwood_gwa/output_bslmm/.
I haven't yet dealt with hyperparameters. I used grabMavEffects.pl and grabPips.pl to extract PIPs and model-averaged effect estimates. These were then summarized/visualized with plots.R. Here is plots.R:
files<-list.files(pattern="mav")
L<-12
dat<-vector("list",L)
for(i in 1:L){
dat[[i]]<-scan(files[i])
}
snps<-read.table("snpList.txt",header=F)
csn<-as.numeric(snps[,1]==12033)
tits<-c("Weight 1, C","Weight 2, C","Survival, C",
"Weight 1, C, no BCTURN","Weight 2, C, no BCTURN","Survival, C, no BCTURN",
"Weight 1, RW","Weight 2, RW","Survival, RW",
"Weight 1, RW, no BCTURN","Weight 2, RW, no BCTURN","Survival, RW, no BCTURN")
pdf("TknulBSLMM_mav.pdf",width=8,height=8)
par(mfrow=c(2,1))
par(mar=c(4,5,2,1))
for(i in 1:L){
plot(abs(dat[[i]]),pch=19,col=c("darkgray","cadetblue")[csn+1],xlab="SNP",ylab="Mod. average eff.",cex.lab=1.4)
title(main=tits[i])
}
dev.off()
##########
files<-list.files(pattern="pip_")
L<-12
dat<-vector("list",L)
for(i in 1:L){
dat[[i]]<-scan(files[i])
}
snps<-read.table("snpList.txt",header=F)
csn<-as.numeric(snps[,1]==12033)
tits<-c("Weight 1, C","Weight 2, C","Survival, C",
"Weight 1, C, no BCTURN","Weight 2, C, no BCTURN","Survival, C, no BCTURN",
"Weight 1, RW","Weight 2, RW","Survival, RW",
"Weight 1, RW, no BCTURN","Weight 2, RW, no BCTURN","Survival, RW, no BCTURN")
pdf("TknulBSLMM_QTL.pdf",width=9,height=9)
par(mfrow=c(2,2))
par(mar=c(4,5,2,1))
n<-table(snps[,1])
for(i in 1:L){
qtl<-tapply(X=dat[[i]],INDEX=snps[,1],sum)
cs<-rep("darkgray",length(qtl))
cs[which(names(qtl)==12033)]
plot(as.numeric(n),qtl,pch=19,col=c("darkgray","cadetblue")[csn+1],xlab="SNP",ylab="PIP",cex.lab=1.4)
title(main=tits[i])
}
dev.off()
##########
files<-list.files(pattern="pip_")
L<-12
dat<-vector("list",L)
for(i in 1:L){
dat[[i]]<-scan(files[i])
}
snps<-read.table("snpList.txt",header=F)
csn<-as.numeric(snps[,1]==12033)
tits<-c("Weight 1, C","Weight 2, C","Survival, C",
"Weight 1, C, no BCTURN","Weight 2, C, no BCTURN","Survival, C, no BCTURN",
"Weight 1, RW","Weight 2, RW","Survival, RW",
"Weight 1, RW, no BCTURN","Weight 2, RW, no BCTURN","Survival, RW, no BCTURN")
pdf("TknulBSLMM_pip.pdf",width=8,height=8)
par(mfrow=c(2,1))
par(mar=c(4,5,2,1))
for(i in 1:L){
plot(dat[[i]],pch=19,col=c("darkgray","cadetblue")[csn+1],xlab="SNP",ylab="PIP.",cex.lab=1.4)
title(main=tits[i])
}
dev.off()
The main take-home thus far is that survival and weight on RW doesn't show much of a signal, however, there is a visually clear peak of association on LG11 for weight 2 = 21 day weight on Ceanothus. Not quite what we expected, but interesting and still suggests LG11 is associated with host-specific performance (just of the host that is "easier" to use).
2. As a comparison, I also fit standard LMMs with gemma.
perl RunGemmaLmmFork.pl mod_geno_*
which runs (note this uses centered but not standardized genotypes for the kinship matrix):
#!/usr/bin/perl
#
# run gemma jobs
#
use Parallel::ForkManager;
my $max = 12;
my $pm = Parallel::ForkManager->new($max);
foreach $g (@ARGV){
$ph = $g;
$ph =~ s/mod_g/ph/;
$base = $g;
$base =~ s/mod_geno_//;
$base =~ s/\.txt//;
foreach $trait (1..3){
$pm->start and next; ## fork
$out1 = "k_$base"."_ph$trait";
$out2 = "o_$base"."_ph$trait";
system "gemma -g $g -p $ph -gk -o $out1\n";
system "gemma -g $g -p $ph -lmm 4 -k output/$out1".".cXX.txt -n $trait -maf 0.0 -o $out2\n";
$pm->finish;
}
}
$pm->wait_all_children;
The results are in output. Summaries were produced with the local version of plots.R
files<-list.files(pattern="assoc")
L<-12
dat<-vector("list",L)
for(i in 1:L){
dat[[i]]<-read.table(files[i],header=TRUE)
}
tits<-c("Weight 1, C","Weight 2, C","Survival, C",
"Weight 1, C, no BCTURN","Weight 2, C, no BCTURN","Survival, C, no BCTURN",
"Weight 1, RW","Weight 2, RW","Survival, RW",
"Weight 1, RW, no BCTURN","Weight 2, RW, no BCTURN","Survival, RW, no BCTURN")
pdf("Tknul_LMM_p.pdf",width=8,height=8)
par(mfrow=c(2,1))
par(mar=c(4,5,2,1))
for(i in 1:L){
LG<-as.numeric(matrix(unlist(strsplit(as.character(dat[[i]]$rs),"_")),ncol=3,byrow=T)[,2])
csn<-as.numeric(LG==12033)
plot(-10 *log10(dat[[i]]$p_lrt),pch=19,col=c("darkgray","cadetblue")[csn+1],xlab="SNP",ylab="-10 log10(p)",cex.lab=1.4)
title(main=tits[i])
}
dev.off()
Here we still see the signal for weight, but now we also see a (stronger even?) signal of association of LG11 with survival on Ceanothus. This too is interesting, but not quite expected.
3. I also played with PCAs by LG. See the first bit of formatPhenoGeno.R in redwood_gwa. LG11 stands out as having discrete clusters that point to a structural variant. These show up in BCTURN and the BC* populations (we also see the geographic BCTURN signal but on PC2). Thus, the SV? appears to be segregating across populations. A second PCA on 100 SNP sliding windows shows that it spans most of but not the ends of LG11.