Post date: Sep 07, 2017 2:46:23 AM
6ix17.
We are now mapping within the three populations that have decent sample sizes (almost 100 but not quite): SIN, GNP, YBG.
We ran the BSLMM in gemma on the three populations with the largest
sample sizes: SIN, GNP, and YBG.
Here is what we did.
1. Split the phenotype and genotype files to generate one for each of
these populations. This uses the mkPops.pl script. The infiles are here
king:/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/infiles/
2. Run the gemma BSLMM. To do this we executed
../scripts/wrap_qsub_slurm_gemma_pop.pl, once per phenotype set (areas
or centroids) (from infiles sub-directory):
perl ../scripts/wrap_qsub_slurm_gemma_pop.pl GNP_mod_geno_size.txt
SIN_mod_geno_size.txt YBG_mod_geno_size.txt
perl ../scripts/wrap_qsub_slurm_gemma_pop.pl GNP_mod_geno_coord.txt
SIN_mod_geno_coord.txt YBG_mod_geno_coord.txt
This runs a new version of a fork script that runs gemma
(forkRunGemmaPop.pl). Here is the code
###
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(12); ## 12 concurrent processes
$g = shift(@ARGV); ## genotype file
$p = shift(@ARGV); ## phenotype file
$ch = shift(@ARGV); ## chain number number
$nph = shift(@ARGV); ## number of phenotypes
$out = $p;
$out =~ s/\.txt// or die "failed sub $p\n";
TRAITS:
foreach $n (1..$nph){
$pm->start and next TRAITS; ## fork
$outch = "o_$n"."_$out"."_$ch";
system "gemma-0.94.1 -g $g -p $p -bslmm 1 -n $n -o $outch -maf
0.001 -w 200000 -s 1000000\n";
$pm->finish;
}
$pm->wait_all_children;
###
The only really imporant thing is that we are running 5 chains with a
200,000 burnin and 1 million iterations each. The output/results are in
/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/output_pops/.
3. Summarize the results.
Summarize the posteriors for the genetic architecture parameters:
perl ../scripts/calpostPop.pl
Take mean breeding value across replicate chains:
perl ../scripts/getBvsPop.pl
Summarize effect size estimates:
perl ../scripts/grabCalsEffectsPop.pl
4. Here is what you have (compare this to your existing notes); these
are all in
/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/output_pops/:
- Files with the MCMC samples for PVE, PGE and no. SNPs (in that order,
those are the columns). There is one file for populaiton = genarch_*.txt
- Files giving the posterior inclusion probs. (pips_*txt) for size and
coord traits, and similarly files given the model-averaged effect
estimates (effects_*txt). hey both have the same general format. There
is one row per SNP. The first two columns provide the scaffold and SNP
position. The remaining columns give the PIPs or effect estimates for
each trait (one trait per column).
- Files with the breeding values for size and coord. traits for each
population, bv_*txt. Each has one individual per row (hence the need for
two files as sample sizes differ) and one column per phenotype (in
order). You can treat these the same as your phenotypic matrixes.