The EBLASSO package
Brief description of the EBLASSO algorithm
In regression models, the EBLASSO assigns hierarchical prior distribution (Currently two priors available: scale mixture of Normal and Exponential distributions, or Normal, Exponential and Gamma distributions) to the regression coefficients and estimates their marginal posterior distributions.
The precisions of regression coefficients take value of infinite or finite values, which control the inclusion or exclusion of variables into the selection model.
Marginal posterior distributions of those variables with finite precisions are inferred, result in the sparse model learning.
EBLASSO and state of the art algorithms
Speed: Superior computational speed is achieved through closed-form solution of inferring precisions of regression coefficients, and only marginal posterior distribution of those variables with finite precisions are inferred. (In comparison with EM algorithms that all posterior distributions of the regression coefficients need to be inferred to decide which ones to include into the final sparse model)
Accuracy: Robust variable selection and estimation are achieved through estimation of variance components and both maximum a posteriori. (In comparison with LASSO or HyperLasso that only MAP available)
Software User Interface
User manual are available along with the software package. The software is freely avaiable upon request.
Input and output of the basic functions:
Input:
feature matrix: each column is a feature vector.
Response vector: a column of response variable
Hyperparameter(s): (a,b) for EBLASSO-NEG, or \lambda for EBLASSO-NE
Epis: "no" = no interaction considered; "yes"= two way interaction considered
Verbose: level of information output
Output:
weight: the regression coefficients of the selected sparse model; col1 and 2: feature indices; col3: regression coefficients; col4: variance
Wald score: wald score for association test
Intercept: regression interface
Hyperparameter(s): same as the input
Example:
This software package is implemented in C and interfaced with R
The package contains two dataset: a Binomial dataset and a Gaussian dataset. An example of analyzing the Gaussian dataset is:
With parameters (a, b), the EBLASSO selected 7 out of the 481 variables in this dataset. By setting Epis = “yes”, the model considers two way interactions among all available variables:
With parameters (a, b), the EBLASSO selected 31 out of the 115921 variables in this dataset.
The SML software
Parallel computing is under development.