In genetic association studies (GAS) as well as in genome-wide association studies (GWAS), the model of inheritance (dominant, additive and recessive) is usually not known a priori. Assuming an incorrect model of inheritance may lead to substantial loss of power, whereas on the other hand, testing all possible models may result in an increased type I error rate. The situation is even more complicated in meta-analysis of GAS or GWAS, in which individual studies are synthesized in order to derive an overall estimate. Meta-analysis is widely used in order to increase the power to detect weak genotype effects, but heterogeneity and incompatibility between the included studies complicate things further.
For both practical and theoretical reasons meta-analysis of GWAS is usually performed using only summary estimates (effect sizes or Z-values) from the included studies. Along these lines, using the theoretical results described by (Zhou et al. 2011), we show that a simple approach for robust meta-analysis would be to perform separate analysis for each study using a robust method of choice (i.e. MAX, MERT or MIN2) and afterwards, combine the individual results by pooling the Z-values or the effect sizes in a fixed or random-effects method. This approach, is closer to the spirit of meta-analysis of GWAS described by (de Bakker et al. 2008) and it is very easily implemented, provided that the summary estimates of the individual studies are available. As a proof of concept, a Stata program that implements the meta-analysis using the MERT statistic is given below.
A list of available software for robust analysis and meta-analysis of GAS and GWAS is also given. Moreover, STATA implementations for the MAX, MIN2 and GMS approach are given for the first time. This is the first complete effort to implement procedures for robust analysis and selection of the appropriate genetic model in GAS or GWAS using STATA. Since there are only a few available software implementations of the robust methods for meta-analysis of GAS or GWAS our future goal is to extend our software in the context of meta-analysis using STATA
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
replace aa1=aa1+0.5 if aa1==0
replace ab1=ab1+0.5 if ab1==0
replace bb1=bb1+0.5 if bb1==0
replace aa0=aa0+0.5 if aa0==0
replace ab0=ab0+0.5 if ab0==0
replace bb0=bb0+0.5 if bb0==0
gen R=aa1+ab1+bb1
gen S=aa0+ab0+bb0
gen n0=aa1+aa0
gen n1=ab1+ab0
gen n2=bb1+bb0
gen N=R+S
* Numerator of the trend tests: 00 for REC, 05 for ADD, and 10 for DOM
gen u00=1/N*( S*(0*aa1+0.0*ab1+1*bb1)-R*(0*aa0+0.0*ab0+1*bb0) )
gen u05=1/N*( S*(0*aa1+0.5*ab1+1*bb1)-R*(0*aa0+0.5*ab0+1*bb0) )
gen u10=1/N*( S*(0*aa1+1.0*ab1+1*bb1)-R*(0*aa0+1.0*ab0+1*bb0) )
* Denominator of the trend tests;
gen vu00=(R*S/N)*( (0*n0+0.0^2*n1+1*n2)/N - ((0*n0+0.0*n1+1*n2)/N)^2 )
gen vu05=(R*S/N)*( (0*n0+0.5^2*n1+1*n2)/N - ((0*n0+0.5*n1+1*n2)/N)^2 )
gen vu10=(R*S/N)*( (0*n0+1.0^2*n1+1*n2)/N - ((0*n0+1.0*n1+1*n2)/N)^2 )
* Three CATT
gen z1=u00/sqrt(vu00)
gen z2=u05/sqrt(vu05)
gen z3=u10/sqrt(vu10)
gen r23=(n0*(n1+2*n2))/(sqrt(n0*(n1+n2))*sqrt(n0*(n1+2*n2)+n2*(n1+2*n0)))
gen r12= (n2*(n1+2*n0))/(sqrt(n2*(n1+n0))*sqrt(n0*(n1+2*n2)+n2*(n1+2*n0)))
gen r13=(n0*n2)/(sqrt(n0*(n2+n1))*sqrt(n2*(n0+n1)))
gen z_mert=(z1+z3)/(sqrt(2*(1+r13)))
gen Ib=(1/(1/S +1/R))
gen beta_mert=z_mert/sqrt(Ib)
gen sebeta=sqrt(1/Ib)
metan beta_mert sebeta,randomi
coin: http://cran.r-project.org/web/packages/coin/index.html (Hothorn and Hothorn 2009)
SNPassoc: http://cran.r-project.org/web/packages/SNPassoc/index.html (Gonzalez et al. 2007)
RobustSNP: https://sites.google.com/site/honcheongso/software/robustsnp (So and Sham 2011)
Rassoc: http://cran.r-project.org/web/packages/Rassoc/index.html (Zang et al. 2010)
R macro: http://www.biometrics.tibs.org/datasets/080703P_Rcode.txt (Joo et al. 2010b)
SAS macro: http://biostatistics.oxfordjournals.org/content/9/3/391/suppl/DC1 (Zheng and Ng 2008)
coin: http://cran.r-project.org/web/packages/coin/index.html (Hothorn and Hothorn 2009)
metagen: http://fmwww.bc.edu/repec/bocode/m/metagen.ado (Bagos and Nikolopopulos 2007)
Stata macro: http://www.compgen.org/tools/multivariate-genetic/ (Bagos 2008)
Bagos PG (2008) A unification of multivariate methods for meta-analysis of genetic association studies. Stat Appl Genet Mol Biol 7: Article31 [PDF] [Google Scholar]
Bagos PG, Nikolopoulos GK (2007) A method for meta-analysis of case-control genetic association studies using logistic regression. Stat Appl Genet Mol Biol 6: Article17
de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17: R122-8
Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X, Moreno V (2007) SNPassoc: an R package to perform whole genome association studies. Bioinformatics 23: 644-5
Hothorn LA, Hothorn T (2009) Order-restricted scores test for the evaluation of population-based case-control studies when the genetic model is unknown. Biom J 51: 659-69
Joo J, Kwak M, Zheng G (2010b) Improving power for testing genetic association in case-control studies by reducing the alternative space. Biometrics 66: 266-76
So HC, Sham PC (2011) Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates. Behav Genet 41: 768-75
Zang Y, Fung WK, Zheng G (2010) Simple algorithms to calculate asymptotic null distribution for robust tests in case-control genetic association studies in R. Journal of Statistical Software 33
Zheng G, Ng HK (2008) Genetic model selection in two-phase analysis for case-control association studies. Biostatistics 9: 391-9
Zhou B, Shi J, Whittemore AS (2011) Optimal methods for meta-analysis of genome-wide association studies. Genet Epidemiol 35: 581-91