1. MTG is a computer program to provide Genomic Residual Maximum Likelihood (GREML) estimates for genetic and environmental variance and covariance across multiple traits. The program implements a multivariate linear mixed model and can fit complex covariance structures that can be derived from genomic information, i.e. multivariate version of GCTA GREML. The program also provides best liner unbiased prediction (BLUP) of additive genetic effects; either breeding values or predictions of genetic risk. MTG uses the direct average information algorithm (Lee and van der Werf; Genet Sel Evol 2006; 38:25-43). For more details of GREML and GBLUP, please see

Maier, R., et al. (2015) Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder and major depression disorder. The American Journal of Human Genetics 96,283-294

2. We combined the direct AI algorithm with an eigen-decomposition of the genomic relationship matrix, as first proposed by Thompson and Shaw (Biometrics 1990; 46:399-413). We can apply the procedure to analysis of real data with univariate, multivariate and random regression linear mixed models with a single genetic covariance structure, and demonstrate that the computation efficiency can increase by > 1,000 fold compared with standard REML software based on Mixed Model Equations.The details of the procedure and application are in

Lee, SH and van der Werf, JHJ (2016) MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics 32, 1420-1422

3. We theoretically derived the relationship between the genomic prediction accuracy and population parameters, e.g. effective population size (Ne). We used a stochastic coalescence simulation and ral data analyses to verify the theory. This study shows that  the area under the receiver operating characteristic curve (AUC) increased exponentially with decreasing Ne, e.g. from 0.6 with Ne =10,000 to 0.9 with Ne =100 It also shows that the top percentile of the estimated genetic profile scores had 23 times higher proportion of cases than the general population (with Ne = 100), which increased from 2 times higher proportion of cases (with Ne = 10000). (also see section 7, 8, 9 and 10 in the manual)

Lee, S.H. et al. (2017) Using information of relatives in genomic prediction to apply effective stratified medicine. Scientific Reports 7: 42091.

4. We present a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as ‘unrelated’ individuals from the wider population. The various sources of information were modeled as different populations with different effective population sizes (Ne). With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. When using multiple reference populations that have different degrees of relationship or/and have the imperfect genetic correlation (< 1) between reference populations, MTG2 can calculate a weighted prediction accuracy (see section 9.1 in the manual).

Lee et. al. (2017) Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship. PLoS ONE 12(12): e0189775.

5. We have developed multivariate reaction norm model (MRNM) to tackle genotype–environment (G–E) correlation and interaction problems. It is well known that G–E correlation causes spurious G–E interaction signals although there is few statistical tools to correct this bias. MRNM implemented in mtg2 (section 1.4) can unbiasedly estimate G–E interaction in the presence of G–E correlation and even have higher power to detect the interaction, compared to existing methods. It is also notable that MRNM is efficient to detect significant heterogeneity in the estimated residual variances across different environmental or covariate levels. For more detail, please see the following paper. 

Ni et al. (2019) Genotypecovariate correlation and interaction disentangled by a whole-genome multivariate reaction norm model. Nature Communications 10: 2239.

6. CORE GREML (see chapter 15 in the manual and example 12) can estiamte correaltion between two random effects in the phenotypic analysis where the covariance structure between the random effects are not pre-defined, e.g. genome-transcriptome corerlation in the phenotypic analysis of a complex trait. 

Zhou, Im and Lee (2020) CORE GREML: Estimating covariance between random effects in linear mixed models for genomic analyses of complex traits. Nature Communications 11: 4208.

7. GxEsum GxEsum script, README and example

Genetic variation in response to the environment is fundamental in biology and has been described as genotype-by-environment interaction (GxE), reaction norm or phenotypic plasticity. In the genomic era, there has been increasing interest in estimating GxE, using genome-wide SNPs, e.g. a whole-genome reaction norm model (RNM) that can estimate unbiased genome-wide GxE. However, the existing approach is computationally demanding and infeasible to handle large-scale biobank data. Here we introduce GxEsum, a model for estimating GxE based on GWAS summary statistics, which can be applied to a large sample size. In simulations, we show that GxEsum can control type I error rate and produce unbiased estimates in general. We apply GxEsum to UK Biobank to estimate genome-wide GxE for BMI and hypertension, and find that the computational efficiency of GxEsum is thousands of times higher than existing whole-genome GxE methods such as RNM. Because of its computational efficiency, GxEsum can achieve a higher precision (i.e. power) from a larger sample size. As the scale of available resources has been increased, GxEsum may be an efficient tool to estimate GxE that can be applied to large-scale data across multiple complex traits and diseases.

Shin and Lee (2020) GxEsum: genotype-by-environment interaction model based on summary statistics. bioRxiv preprint doi:

The algorithms, theory, coalescence simulation functions are implemented in MTG2 software that can be downloaded from the link below. There are manual and examples.

8. Integrative analysis of genomic and exposomic data (IGE) 

IGE is a whole-genome approach to the estimation of heritability and g x e interactions, which models variances explained by additive effects of exposomic variables, by exposome x exposome interactions, and by exposome x covariate (such as demographics) interactions; and covariance between genetic effects and exposomic effects (Table 3). Further, bivariate or multivariate IGE (i.e., simultaneously including two or more traits) can be feasibly performed using mtg2 version 2.18. Please see section 17 in the manual and exampleIGE below, which can be also found in the IGE GitHub.  

Version 2.17 has been optimised for the computing speed of multivariate linear mixed models (REML) that is > 10 times faster than earlier versions when fitting many levels of covariates.

Version 2.18 has now BLUP SNP (providing SNP effect, its SE, reliability, Wald test p-value, i.e. GWAS summary stats). It can supports univariate as well as multivariate models (see section 16 in the manual and example13 and 13-2). 

mtg2 version 2.09 for linux

mtg2 version 2.09 for window

mtg2 version 2.08 for linux

mtg2 version 2.08 for window

mtg2 version 2.06 for linux

mtg2 version 2.06 window

mtg2 version 2.05 for linux

mtg2 version 2.05 for window

mtg2 version 2.05 for mac

mtg2 version 2.04 for linux

mgt2 version 2.04 for window

mtg2 version 2.04 for mac

mtg2 version 2.02 for linux

mtg2 version 2.02 for window (Thank to Dr. Hawlader Al-mamun (Mamun) at UNE)

mtg2 version 2.02 for mac


example 0

example 1

example 1-3

example 1-3-2

example 1-4

example 2

example 2-2

example 3

example 4

example 5

example 6

example 7

example 7-2

example 9

example 10-2

example 12

example 13



mtg2 version 2.02 source code (fortran)

mtg2 version 2.05 source code 

mtg2 version 2.06 source code

mtg2 version 2.08 source code

mtg2 version 2.09 source code

mtg2 version 2.10 source code

mtg2 version 2.14 source code

mtg2 version 2.15 source code

mtg2 version 2.17 source code

mtg2 version 2.18 source code

The source codes are released under GNU General Public License v3.

Update details 


mtg2 version 2.01

Binary file for linux (Mar/16)

Delta function added (section 5) (Mar/16)

Product matrix for random variable to fit random effects (section 4) (Mar/16)

Spline, -spl with –eig and -rrme 1 (residual covariance) checked and confirmed (Mar/16)

Estimating GRM added (section 6) (Apr/16)

Fixed a bug when fitting class variable as fixed effects  (Apr/16)

Multivariate random regression model (section 1.26, 1.27 and 1.28) (Apr/16)

Reliability for BLUP (section 2) (Apr/16)

Binary file for window (Apr/16)


mtg2 version 2.02

gz format GRM from GCTA or PLINK1.9 can be used (section 1.1, and 2) (May/16)

Search a better starting values in an initial iteration for MVLMM (May/16) 

Effective number of chromosome segments (section 7) (May/16) 

Variance of relationship estimation (section 8) (May/16) 

Prediction accuracy theory (section 9) (May/16) 

Coalescence simulation and phenotype simulation based on given genotype data (section 10) (May/16) 

Transform h2 between observed scale and liability scale (section 2) (May/16)

Transform genetic correlation to co-heritability on the liability scale (section 2) (May/16)


mtg2 version 2.04

Constrain some parameters during REML (section 11) (Dec/16)

# knots in spline function in univariate RRM can varied across different random effects (Jan/17)

In estimating predicted accuracy, the input parameter should now have # SNPs (section 9) (Jan/17)  

mtg2 version 2.05
Section 12. H matrix added

 mtg2 version 2.06

Section 9. Prediction accuracy revised

Section 6. Weighted GRM added

 mtg2 version 2.08

Section 1.4. Reaction norm model

mtg2 version 2.09

Version 2.09 has fixed or improved a few things.

1.     The ID order does not have to be the same between the fam file and phenotypic data file. But, the ID order between phenotypic data file and other covariate files still have to be the same.

2.     Some memory allocation problems have been fixed especially for BLUP output part for the multivariate random regression model.     

To do list

Reliability for BLUP (GPA) (when using -eig or -rrm)

Weighting residual structure

Snp_blup (considering multiple inputs, e.g. snpvn)

*.py output when using -rrm or -spl

Search a better starting values in an initial iteration for random regression

Spline function for multivariate random regression