BSLMM for each pop

Post date: Sep 07, 2017 2:46:23 AM

6ix17.

We are now mapping within the three populations that have decent sample sizes (almost 100 but not quite): SIN, GNP, YBG.

We ran the BSLMM in gemma on the three populations with the largest

sample sizes: SIN, GNP, and YBG.

Here is what we did.

1. Split the phenotype and genotype files to generate one for each of

these populations. This uses the mkPops.pl script. The infiles are here

king:/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/infiles/

2. Run the gemma BSLMM. To do this we executed

../scripts/wrap_qsub_slurm_gemma_pop.pl, once per phenotype set (areas

or centroids) (from infiles sub-directory):

perl ../scripts/wrap_qsub_slurm_gemma_pop.pl GNP_mod_geno_size.txt

SIN_mod_geno_size.txt YBG_mod_geno_size.txt

perl ../scripts/wrap_qsub_slurm_gemma_pop.pl GNP_mod_geno_coord.txt

SIN_mod_geno_coord.txt YBG_mod_geno_coord.txt

This runs a new version of a fork script that runs gemma

(forkRunGemmaPop.pl). Here is the code

###

use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new(12); ## 12 concurrent processes

$g = shift(@ARGV); ## genotype file

$p = shift(@ARGV); ## phenotype file

$ch = shift(@ARGV); ## chain number number

$nph = shift(@ARGV); ## number of phenotypes

$out = $p;

$out =~ s/\.txt// or die "failed sub $p\n";

TRAITS:

foreach $n (1..$nph){

$pm->start and next TRAITS; ## fork

$outch = "o_$n"."_$out"."_$ch";

system "gemma-0.94.1 -g $g -p $p -bslmm 1 -n $n -o $outch -maf

0.001 -w 200000 -s 1000000\n";

$pm->finish;

}

$pm->wait_all_children;

###

The only really imporant thing is that we are running 5 chains with a

200,000 burnin and 1 million iterations each. The output/results are in

/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/output_pops/.

3. Summarize the results.

Summarize the posteriors for the genetic architecture parameters:

perl ../scripts/calpostPop.pl

Take mean breeding value across replicate chains:

perl ../scripts/getBvsPop.pl

Summarize effect size estimates:

perl ../scripts/grabCalsEffectsPop.pl

4. Here is what you have (compare this to your existing notes); these

are all in

/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings/Gemma/output_pops/:

- Files with the MCMC samples for PVE, PGE and no. SNPs (in that order,

those are the columns). There is one file for populaiton = genarch_*.txt

- Files giving the posterior inclusion probs. (pips_*txt) for size and

coord traits, and similarly files given the model-averaged effect

estimates (effects_*txt). hey both have the same general format. There

is one row per SNP. The first two columns provide the scaffold and SNP

position. The remaining columns give the PIPs or effect estimates for

each trait (one trait per column).

- Files with the breeding values for size and coord. traits for each

population, bv_*txt. Each has one individual per row (hence the need for

two files as sample sizes differ) and one column per phenotype (in

order). You can treat these the same as your phenotypic matrixes.

Page updated

Google Sites

Report abuse