SNP-set based Genomic Prediction

Determine Proper Hyperparameters

Last update: 22 November 2020

It is crucial to set hyper-parameters properly in model fitting, because inappropriate hyper-parameter values may harm model fitting and prediction. Here we provide a guide on how to determine proper hyper-parameters.

Variance explained by one SNP

V_SNP=V_Ph²∕Σ2pq=V_Ph²∕(mH̅), where V_SNPis the variance explained by one SNP, V_P is phenotypic variance, h² is heritability, p (q) is the frequency of one allele (the other allele), m is the number of SNPs, and H̅ is the average heterozygosity (usually a number around 0.3). Using this formula, we can get a rough estimate of variance explained by one SNP.

Inverse-gamma prior of variance

Shape parameter usually has little effect on prediction accuracy, so we can always give shape a small value. However, model fitting may be sensitive to scale. In practice, we can set scale parameter to b=V_SNP and then search for optimal scale value within a small range around b=V_SNP by cross validation.

When hyper-parameter optimization is turned on by the --hyper_opt option, we need to set rate parameter of the exponential hyper-prior (by --snp_hyper_exp_rate). Its optimal value can be determined in the same way as the scale parameter of inverse-gamma prior.

Half-Cauchy prior of variance

We would like to set squared scale parameter of half-Cauchy prior to a value slightly larger than V_SNP, which results in a weakly informative prior. Since there may be some QTLs of relatively large effect, setting the squared scale to A²=10V_SNPor even A²=100V_SNP can usually be a safer choice. In fact, the half-Cauchy model can still work well, when an unreasonably large scale is used (approaching a non-informative prior for variance parameter). This is a great feature of half-Cauchy prior compared to inverse-gamma prior.

Generally, it is not necessary to turn on hyper-parameter optimization for the half-Cauchy model.

Google Sites

Report abuse