The Multi-Phenotype GWAS Analysis For Pleiotropic Genetic Effects


CITATIONS



The manuscript describing the algorithms and simulation findings of statistical power comparison is under review now:

(1) Hsu YH and Chen X. Identifying Pleiotropic Genetic Effects: A Two-Stage Approach Using Genome-Wide Association Meta-Analysis Data.



The proposed approach has been presented at:

Methodology:

(2) Hsu YH and Chen X. A powerful multi-phenotype approach on genome-wide association studies to identify novel pleiotropic genes that affected multiple quantitative traits (
19th annual meeting of the International Genetic Epidemiology Society, Boston, MA 2010 /Plenary Talk). Genet Epidemiol 2010;34:922-3.

(3) Hsu YH and Chen X. A Multivariate Approach on Genome-Wide Association Studies (GWAS) by Modeling multiple Traits Simultaneously to Identify Pleiotropic Genetic Effects/Plenary Talk. 2011 The Joint Statistical Meetings (American Statistics Association), Miami Beach, FL, USA, 2011.


If you use our tools, please cite above abstracts and paper (when it is published).
 
CONTACT: Yi-Hsiang Hsu (yihsianghsu@hsl.harvard.edu)
                    Xing Chen (dr.xingchen@gmail.com)




Applications to several GWAS meta-analyses:

Bone mineral density and T2DM (as well as glucose homeostasis)
(4) Hsu YH, Chen X, Meigs, J, Karasik D, Kiel D. Multi-Phenotype Genome-Wide Association Analysis (GWAS) on both BMD and Glycemic Traits Identified Novel Pleiotropic Genes that Affected Bone Metabolism and Glucose Homeostasis in Caucasian Populations. (ASBMR Annual Meeting Presentation) J Bone Miner Res, 2010; 25 (Suppl 1). Available at http://www.asbmr.org/Meetings/AnnualMeeting/

Bone mineral density and lean body mass (as well as fat mass)
(5) Hsu YH, Chen X, Zillikens C, Estrada K, Demissie S, Liu CT, Zhou Y, Karasik D, Murabito J, Uitterlinden A, Cupples LA Rivadeneira F, Kiel DMulti-Phenotype Genome-Wide Association Meta-Analysis on both Lean Body Mass and BMD Identified Novel Pleiotropic Genes that Affected Skeletal Muscle and Bone Metabolism in European Descent Caucasian Populations. (ASBMR Annual Meeting Presentation) J Bone Miner Res, 2011; 26 (Suppl 1). Available at http://www.asbmr.org/Meetings/AnnualMeeting/

Age at Nature Menopause and Menarche
(6) Hsu YH, Chen X, Elks c, Murabito JM, Estrada k, Harris TB, Lunetta KL, Murray A, et al. Genome-Wide Bivariate Association Analysis Identifies Novel Candidate Genes for Ages at Menarche and Natural Menopause and for Bone Mineral Density: The ReproGen and GEFOS Consortia Endocr Rev 2011 32: OR17-4

Fat mass and muscle mass (as well as muscle function)
(7) Hsu YH, Mclean R, Newton E, Hanna M, Cupples LA, Kiel D. Muscle, Fat and Bone Connections: Genetic Risk Factors of Sarcopenic-Obesity and  Dynapenic-Obesity and Their Consequent Risks of Osteoporotic Fractures. (ASBMR Annual Meeting Presentation) J Bone Miner Res, 2012; 27 (Suppl 1). Available at http://www.asbmr.org/Meetings/AnnualMeeting/

Metabolic syndrome (including metabolic syndrome risk factors) and Bone density
(8)

Leg strength and Grip Strength
(9)

Aorta Calcification and Bone mineral density
(10)

G
lucose metabolism and C-reactive protein
(11)

C-reactive protein and Lipids
(12)

Inflammatory markers and bone health
(13)

Bone health and neurological disorders
(14)


back to top


BACKGROUND


In recent years, genome-wide association studies (GWAS) have been increasingly performed to identify genetic determinants underlying complex human diseases [1]. Conventionally, univariate analytic approaches were used to study associations between numerous genetic variants and a single phenotype, one at a time, although multiple correlated phenotypes were often available for joint examination [2-5]. Pleiotropic effects, in which genetic variants influence more than one phenotype or disease, have been widely observed in recent GWAS findings [6]. For example, a genetic variant in the glucokinase regulator gene (GKCR) in Europeans is associated with increased concentrations of plasma triglycerides but reduced fasting glucose levels [7]. Other notable findings include two single-nucleotide polymorphisms (SNPs) in intron 1 of SRY-box 6 (SOX6) that are associated with both body-mass index and hip bone mineral density in male Caucasians [8], and a region on chromosome 8q24 associated with prostate, breast, and colorectal cancers and Crohn’s disease in different populations [9-12]. Despite these examples, detecting pleiotropic genetic effects remains difficult using only univariate approaches [13-15], suggesting alternative statistical approaches may improve the detection of novel pleiotropic effects.Compared to univariate approaches, multivariate methods provide statistical advantages in increasing power and accuracy of parameter estimation in linkage studies [16-19]. Several multivariate methods have been developed and applied to analyze correlated traits jointly on a genome-wide scale [17,20-24]. When individual-level data are available, classic multivariate methods can be directly applied to population-based genetic association studies; these include principle component analysis (PCA) [25], multivariate analysis of variance (MANOVA), generalized estimation equations (GEE) [26-29], and linear mixed effect models (LME) [27]. Additionally, extended GEE has been incorporated in family-based tests (FBAT) [23] for analyzing correlated subjects.

With the increasing abundance of GWAS data, meta-analyses have become popular for pooling results from multiple cohorts to increase the sample size and power of identifying genetic determinants that underlie the etiology of complex disease [30]. However, most multivariate methods require access to individual-level data for further joint analyses. In contrast, a combining test statistics approach integrates univariate meta-analysis results into a global test, thereby significantly improving statistical power. Fisher first proposed the combined probability test, which combines results from several independent tests of the same overall hypothesis (H0) [31]. However, Fisher’s exact test method likely introduces inflated type-1 error rates when tests are dependent[32]. O’Brien and Wei separately proposed a better approach [33-36] for combining multiple correlated traits. Although demonstrating greater power when individual test statistics are homogeneous, this method usually does not achieve desired power when the effect directions are different [37]. Yang et al. [37] extended O’Brien’s approach by using sample splitting and cross-validation methods to gain power when heterogeneous genetic variants exist. Importantly, individual-level data are required to estimate the optimal weight in this approach, and only part of a population is used in inferring the final test statistics. Another approach, “TATES”, was recently proposed to combine p-values from univariate GWAS while correcting the observed phenotypic correlations [38]. This approach cannot be applied in individual-level data and was not explicitly examined for traits that are negatively correlated. Province et al. [39] also provide a powerful tool to conduct multivariate analyses on correlated univariate GWAS summary results, but they only evaluated the validity of their proposed tetrachoric correlation estimation; it remains unclear whether their method achieves better power than others, especially when mixed genetic effects exists.

Notably, significant findings from multivariate analyses do not always indicate pleiotropic effects. Such findings could be likely introduced by correlations between two phen[37]otypes, rather than by independent effects of a genetic variant [40]. Several methods have been proposed to differentiate between causal genetic effects and associations caused by correlations between phenotypes [40,41][38][37-39][37-39][37-39][37-39]. However, few methods have been studied and applied in the context of pleiotropy testing.

To overcome these limitations, we propose a two-stage approach (Figure 1). In the first-stage, two genome-wide screening approaches , direct linear combining approach (dLC) and empirical linear combining approach (eLC),) using aggregated results from meta-analysis were developed. In the second-stage, pleiotropy was identified in markers selected from the first stage using an approach called cPLT.  We further demonstrated the proposed two-stage approach to identify pleiotropy in the Genetic Analysis Workshop 16 (GAW16) Problem 3 simulation data sets.


APPROACHES



Direct linear combination of correlated test statistics (dLC)

In general, let T= (T1,T2,…,Tk)T denote a vector of K correlated test statistics obtained individually from each univariate analysis for a  specific trait against a genetic marker. Under most circumstances in current GWAS studies, T usually follows an asymptotical multivariate normal distribution with mean   )T  and known or estimated covariance Σ, where Σ is a k × k symmetric matrix . Assume the null hypothesis we want to test in multivariate analysis is H0: = 0. In other words, the genetic variant is not associated with any phenotype. In contrast, the general alternative hypothesis H1 is at least one >0, k=1,…,k. Extending from O’Brien’s theory[33], we propose a new approach for combining correlated traits [40],[40] called direct linear combination of test statistics (dLC). The new test statistics of dLC can be written as:

 

Under the null hypothesis,  follows a  distribution with K degrees of freedom and can be effectively used to test the joint significance of dependent univariate test statistics.


Empirical linear combination of correlated test statistics (eLC)

As illustrated in Xu et al. [33], dLC may not have optimal power against specific alternatives resulting from the heavy tail of the  distribution. Therefore, we further proposed a data-driven empirical approach to combine correlated test statistics as:

]           (2)

where c is some given non-negative constant. The weight in this new test statistics will be optimally determined by the specific data structure. For instance, when c=0, the test statistics simply reduces into sum of squares of Tk. When c is relatively large, equal weight is assigned to each Tk. Ideally, we would like to find an optimal value of c, so the  performs as a linear combination of Ti when under the H0; but, under the alternative HA, more weight is given to the larger true Ti. The bona fide p-value for  then can be estimated by applying permutation or perturbation techniques (see Supplementary).

   Estimation of the Variance-Covariance Matrix  Σ

Several methods can be used to estimate the covariance matrix Σ of univariate test statistics. For simplicity, we demonstrate two estimation approaches in a bivariate scenario. Based on  the method used in Yang et al. [37], we first utilize the sample covariance matrix of the test statistics of all SNPs from univariate GWAS analyses as an approximation. Σ, the covariance matrix, thus can be estimated as:

where Z1 consists of unbiased univariate test statistics of all the SNPs for the first trait on genome-wide scale, so does Z2. On the other hand, Σ can be estimated by using generalized least squares from individual-level data, as suggested by O’Brien [34]. A similar approach is also demonstrated by Liu et al. [29].

In this paper, the combining tests utilizing a covariance matrix approximated from the summary GWAS test statistics will be denoted by OBz, dLCz, and eLCz. Similarly, those with a covariance matrix calculated from individual-level data will be referred to as OBi, dLCi, and eLCi.

Simulation Study



Monte-Carlo Simulations were employed to generate data for evaluating the validity and performance of all the multivariate methods, especially the proposed dLC and eLC approaches. The main scenarios and key parameters for simulating genotype data are shown in Table 1. Various conditions were considered, including equal genetic marker contributes to both traits (Scenario A) and unequal genetic effects (Scenario B). To evaluate the proposed pleiotropy test strategy in Stage 2, a special Scenario C was also introduced. In particular, a single continuous trait was first generated with an assigned effect size. The second trait was then simulated by artificially adding random noise on the first generated trait. Therefore, the genetic variant is directly associated with the first trait but indirectly linked with the second trait, as illustrated in R2 and R3 in Figure 2. In all simulations, only quantitative traits and unrelated subjects were generated. A sample size of 1000 subjects was simulated in each of 1000 replicates.

The effect size of the SNP on each trait is estimated from   , where p is the minor allele frequency (MAF) at the locus and  , heritability, is the phenotypic variance explained by the SNP. The heritability is simulated at 1% and 2% in the Simulation Series I, and 0.5% and 0.1% in Simulation Series II for each trait, respectively. The variance of the environmental effects was fixed as 1 in all simulation studies. The genotypes were then generated under Hardy-Weinberg equilibrium with a specified MAF at 10%.

Bivariate quantitative phenotypes were randomly drawn from a bivariate normal distribution to represent the pleiotropic relationship R1 in Figure 1:

Y= ~ N

where  is the additive genetic effect size for trait i, X is the additive score of the coded allele, and  is the residual correlation between  and ,respectively.

Various simulation scenarios with respect to  were further generated. Briefly,  was selected at -0.75, -0.25, 0.01, 0.25, and 0.75 to mimic the correlations we have observed in real GWAS data analyses in the Simulation Series I. A smaller range of  at -0.5, -0.25, 0.01, 0.25 and 0.5 was also employed in Simulation Series II with lower genetic heritability. Notably, a mixture of protective and deleterious genetic effects was introduced when negative  was used in generating data. For instance, we assigned β1=- β2 as the alternative hypothesis when  was -0.25.

In addition to bivariate scenarios, we also generated three moderately correlated quantitative traits to evaluate the performance of various approaches when more than two traits were combined. For simplicity, the genetic heritability was equivalently set at 0.5% for each of the three traits. And the pairwise phenotypic residual correlation was chosen at 0.25, -0.15, and -0.20. The data were then generated under the null hypothesis Ho: β1=β2=β3=0 and two alternative hypotheses, H1: β1=-β2=-β3 >0 and H1: β12=-β3 >0, respectively.

Each replicate was analyzed by using OB, dLC, and eLC approaches individually. We estimated the covariance matrix Σ through approximation from summary data and individual-level data. Other multivariate methods were also compared, including LME with a random effect accounting for phenotypic correlations, GEE, MANOVA, and PCA, in which the first component was used as a dependent variable in the subsequent linear model. All analyses were conducted in R software (http://r-project.org/). Power and type I error rates of each approach were calculated as the proportion of replicates with a p-value less than a given significant threshold in the corresponding scenarios. Specifically, power was derived with 1000 replicates with the significance level at p-value equal to 10-4 for each true scenario in Simulation Series I, and 10-2 for those in Simulation Series II. Type I error rates were estimated in the settings that β12=0 with 1000 replicates at nominal significance levels of 0.05 and 0.01. In the context of adjusting multiple testing for two univariate association tests of two phenotypes, standard Bonferroni corrections were applied with the significance level at 5*10-5 and 5*10-3 for Simulation Series I and II separately.

 


SIMULATIONS


Part I.  Performance of multivariate methods

Valid type I error rates, as expected, were consistently observed across all simulation scenarios at different nominal levels for all proposed and existing multivariate approaches using individual-level data (Table 2). In particular, our proposed approaches, direct linear combining (dLC) and empirical combining (eLC), demonstrate robustness under the null hypothesis, regardless of the directions of phenotypic residual correlation and approaches employed to estimate the covariance matrix Σ.



The estimated power using individual-level data in this series is presented in Table 3. Simulation Series I assessed the performance of various multivariate approaches under relevant hypotheses, in which relatively large genetic effects were simulated. The results are presented with respected to two alternative hypotheses, H11=|β|2>0 and H11>|β2|>0. In particular, we assigned a mixture of protective and deleterious genetic effects when negative phenotypic residual ρ was adopted in the simulation. All the methods performed well when test statistics were homogeneous, but their power varied considerably under the hypothesis of mixed genetic effects. For instance, our proposed approaches showed comparable power to MANOVA under the first alternative hypothesis; GEE, LME, and the O’Brien method (OB) had low power. In addition, principle component analysis (PCA) outperformed other methods when traits were highly correlated and showed less power than our proposed approaches when traits were barely related (ρ=0.01). The results under the second alternative hypothesis produced similar conclusions across various simulation scenarios. In general, our proposed approaches are substantially superior to univariate analysis approaches under all alternative hypotheses. Additionally, eLC outperformed dLC marginally in this simulation series. 



We next examined to what extent power is changed by applying our proposed approach to aggregated data under the same alternative hypotheses. MANOVA, PCA, GEE, and LME are unable to analyze aggregative results. Thus, we compared power between only OB, dLC, and eLC (Table 4). OB was inferior when the univariate test statistics were heterogeneous. dLC and eLC were not only superior to OB when directions of effects were opposite, but nearly as powerful as OB when genetic effects are homogenous. Moreover, similar power was consistently observed by applying two distinct methods to estimate covariance matrix Σ. Again, eLC and dLC showed significant advantages over univariate methods in detecting pleiotropic effects. Similar conclusions could be drawn on the results from Simulation Series II, where smaller genetic effects were simulated (see Supplementary Tables S1, S2). These results demonstrate that our proposed methods have comparable power to other approaches when all effects are in the same direction and much greater power when genetic effects are mixed. These can also be directly and effectively applied in aggregated data, thus efficiently increasing the sample size and power.



We extended our proposed approaches from combining two to combining three correlated traits. We generated data in the same manner as for the bivariate analysis under the null hypothesis, Ho: β1=β2=β3=0, and two alternative hypotheses separately, H1: β1=-β2=-β3 >0 and H1: β12=-β3 >0. We excluded PCA in this series; because two principal components are typically needed to explain at least 80% of the total phenotypic variation, an increased number of subsequent regression tests and adjustments for multiple comparisons are necessary. For individual level-data, among classical methods MANOVA had better power compared to other conservative approaches, including GEE, LME, and OB (Table 5). Our proposed approaches (dLC and eLC) demonstrated comparable power to MANOVA under both alternative hypotheses. However, dLC had less power than eLC in this scenario, likely due to the increased degree of freedom of its underlying chi-square distribution. Additionally, our methods outperform OB using aggregated data when individual-level data are not available.


Part II.  Stage 2 Testing of Pleiotropy

We utilized the Simulation Series II dataset (low effect size described in Table 1) to act as positive and negative controls in evaluating the performance of our proposed stage 2 method of testing pleiotropy under relevant hypotheses.

A specific null hypothesis, “β1>0,β21+ε”, was simulated and used as the negative control. Table 6 presents the estimated type I error rates and power of our proposed approach (conditional testing of pleiotropy, cPLT) in comparison with the performance of the two conditional models strategy. Under the null hypothesis, cPLT demonstrated reasonable type I error rates, regardless of phenotypic residual correlations and effect directions.

Two alternative hypotheses, “β1= |β2|>0” and “β1> |β2|>0”, served as positive controls for detecting pleiotropy. cPLT demonstrated better performance than typical conditional strategies under these two alternative hypotheses (Table 6). However, the power was low for almost all methods when the degree of phenotypic correlation was high. Superior power was consistently found using cPLT under the first alternative hypothesis, in which the genetic variant contributed to phenotypes equally and the phenotypic correlation was modest. In general, statistical power was highest when phenotypic residual correlation was very small, e.g., ρ =0.01. Further, power of cPLT decreased as the degree of phenotypic residual correlation increased. Interestingly, power did not seem to be as strongly influenced by the mixture of protective and deleterious genetic effects in this stage as it was in the first stage.



These findings show that our proposed cPLT method performs well under the intended hypothesized pleiotropic relationship between genetic variants and correlated phenotypes, but its power can vary significantly as the correlation of phenotypes changes.


Joint stage 1 and 2 reducing type 1 error of identifying pleiotropic effects:

We further extensively examined the robustness of our proposed two-stage strategy under the null hypothesis, β1>0,β21+ε. Specifically, we presented type 1 error rates for separate and combined strategies with respect to various alpha levels (Table 7). Type 1 error rates were inflated for dLC applied alone in Stage 1, as expected. However, applying the proposed cPLT approach in Stage 2 would efficiently reduce the false positive results introduced in Stage 1. Notably, the overall false-positive rate was better controlled at the level of 0.05 when the screening threshold at Stage 1 was chosen at 0.025. Thus, this two-stage strategy would allow us to use less-stringent cut-off p-values in the screening stage to increase our power while effectively maintaining the overall false-positive rates through Stage 2.



Part III. An Application to GAW16 simulation data

To validate our novel two-stage approach, we applied the method to the first replicate of the GAW16 Problem 3 simulation dataset. We sought to identify pleiotropic variants (SNPs) associated with two blood lipids, high-density lipoprotein (HDL) and triglycerides (TG), known to influence cardiovascular disease and measured in the Framingham Heart Study, from which the dataset originates. The phenotypic correlation for HDL and TG was -0.28, and the estimated correlation of univariate test statistics of HDL and TG was -0.32. A unified genome-wide significance level was defined by false discovery rate (FDR) at p-value=3*10-6, equivalent to q=0.05, for both bivariate and univariate analysis to screen potential pleiotropic SNPs. Similar performance of dLC and eLC and computational efficiency led us to use dLC to analyze combined correlated univariate GWAS results. The genomic control parameter was 1.01 for this bivariate GWAS analysis. Q-Q plot of the bivariate GWAS results (Supplementary Figure S1) showed no inflation beyond that expected by chance alone. A further comparison of bivariate results from dLC between individual-level data and aggregated data (Supplementary Figure S2) revealed no substantial differences.

The 25 leading SNPs in each genome-wide significant locus from the first stage were used in the second stage to infer their independent effects on HDL and TG by applying our proposed cPLT method with 10,000 permutations (Table 8). The significance level in this stage was 0.002 after Bonferroni correction. The two-stage analysis identified 2 SNPs, rs3200218 in the coding region of LPL and rs8192719 on the exon/intron boundary of CYP2B6. These loci served as positive controls in the GAW16 simulation dataset, and were validated as pleiotropic genes using our strategy. The simulated heritability of rs3200218 for HDL and TG was 0.3% and 0.4%, respectively. For rs8192719, the heritability for HDL and TG was reported as 0.3% for both [42]. Additionally, we detected a negative control SNP, rs7031748. Although it demonstrated a significant p-value from bivariate analysis in the first stage, it failed to reject the null hypothesis in our proposed method in the second stage. Therefore, the observed significant bivariate association of this genetic variant in the first stage likely resulted from indirect correlations, rather than a causal pleiotropic relationship.



Type 1 errors when filtering out SNPs with bivariate p-value > univariate p-value

Commonly, the multivariate p-value of a true pleiotropic marker is proposed to be smaller than that of both its univariate results. We investigated the plausibility of this hypothesis by comparing the distributions of p-values from both combining methods using our simulation data under the alternative hypotheses. The proportions of SNPs with smaller p-values in dLC under various scenarios are provided in Supplementary Table S3. In general, most p-values from dLC were smaller when the marker equally contributed to both traits and the phenotypic correlation was modest. On the other hand, p-values from dLC did not tend to be smaller when the phenotypic correlation was high or unequal genetic effect sizes existed. Adopting an additional significance threshold added little improvement. We further investigated the impact on statistical power by using this filter under relevant alternative hypotheses (Table 9). Applying the small multivariate p-value filtering criteria would introduce modest to significant power loss, largely depending on the magnitudes of phenotypic correlations and genetic effect sizes.

 




DOWNLOAD



 1. eLX v1 C++ package (January, 2013), Download

 2. Read Me document (January, 2013), Download


PROGRAM: eLX

DESCRIPTION:

This C++ code implements the "empirical linear combining of test statiscs multivariate analysis algorithms that presented at:
(1) 2011 The Joint Statistical Meetings (American Statistics Association), Miami Beach, FL, USA, 2011.
Hsu YH and Chen X. A Multivariate Approach on Genome-Wide Association Studies (GWAS) by Modeling multiple Traits Simultaneously to Identify Pleiotropic Genetic Effects/Plenary Talk.
(2) 19th annual meeting of the International Genetic Epidemiology Society, Boston, MA 2010.
Hsu YH, Chen X, Gupta M, et al. A powerful multi-phenotype approach on genome-wide association studies to identify novel pleiotropic genes that affected multiple quantitative traits/Plenary Talk. Genet Epidemiol 2010;34:922-3.

The manuscript to describe the algorithms and simulation findings of statistical power comparison is under review now:
Hsu YH and Chen X. Identifying Pleiotropic Genetic Effects of Correlated Traits and Phenotypes: A Two-Stage Approach by Utilizing GWAS Meta-Analysis Results.

If you use our tools, please cite above abstracts and paper (when it is published).
 
CONTACT: Yi-Hsiang Hsu (yihsianghsu@hsl.harvard.edu)
                    Xing Chen (dr.xingchen@gmail.com)




YEAR: 2012

LICENSE: Released under GNU General Public License, v2 (see
COPYING.txt)

DOCUMENTATION:

INSTALLATION: If you have download a zip or gzipped archive with an executable binary, no installation is necessary (except perhaps you might want to place the executable in your path). Please contact the author for source code and see notes on compilation below.

COMPILATION: You will need a standard C/C++ compiler such as GNU gcc (version 3). This is likely available on all Linux/Unix platforms. For MS-DOS, DJGPP or MinGW are appropriate choices. To help compiling, see
documentation (basically, just be sure to select the correct Makefile and type make -f Makefile.*)

USAGE: Type "eLX" or "./eLX" from the command line followed by the following options of choice


Required:

"-i": input dataset
 
        *** important ***
        Cleaned univariate GWAS summary data in tab delimited format as follows: (The first line is the header )
     
        SNPname    Trait1.Z    Trait2.Z    .....    TraitN.z
     
        rs10000    2.121        3.121        .....    -0.4567


     
"-o": eLCX output filename

"-n":  # of permutation == 10^n



Optional:

"-s": skip # lines

        skip # lines from the beginning of input file in eLX

"-e":  # lines for analysis
   
        # lines preferred to run in eLX   




EXAMPLE CMD:

     eLX -i input.data -o out.data -n 8
     
     eLX -i input.data -o out.data -n 8 -s 1 -e 1000
    
     etc...