gemma analyses: multiple chains and combined data sets

Post date: Nov 19, 2013 10:8:2 PM

I ran three more chains with 5 million steps, 1 million burn-in and a thinning interval of 40. Effective sample sizes vary considerably among parameters (generally quite good for PVE, but not as good for number of SNPs, and quite poor for gamma with SLA on Ac). I am running five more chains (10 total) which should provide good enough sampling of parameters. With that said, I think the model is working relatively well to estimate the key polygenic parameter (PVE) but perhaps not so good to identify individual SNPs for downstream analyses. It might be better to do this with single SNP analyses (although genetic regions PIPs could be useful in this context too). And I still need to look at the posterior predictive distribution with test and training data sets. In other words there is more to do and think about.

On a related note, I am fitting the polygenic gemma model to the survival and adult weight data for combination of experimental treatments and populations: GLA (either host), SLA (either host), Ms (either population), Ac (either population), and all individuals. In all cases I am working with the residuals for survival after fitting a linear model of host, population, or host by population combination. This should remove the effect of these factors on survival. There isn't really a relationship between these factors and adult weight, so used the data not residuals. This is not surprising as these data have already been normal quantile transformed (duh!). I wrote a R script (combineSets.R) to combine and calculate residuals for the phenotypic data, and a perl script (combineGeno.pl) to combine the genoype data similarly. Both are in projects/lycaeides_hostplant/melGemma.

I am now running 5 chains with 5 million steps, a 1 million step burnin, and a thinning interval of 40 to infer the PVE for these combined data sets.

Page updated

Google Sites

Report abuse