Code
“In theory, theory and practice are the same. In practice, they are not." - Einstein A
We are slowly moving all codes and packages to github StatsLE.
- Kindly consider to cite our papers if you find any of the packages helpful to your projects.
[R Package] This R package performs robust convex clustering as described in Liu, Sun and Tan (2019). Classical approaches towards convex clustering solves a convex optimization problem with the cost function being a squared loss plus a fusion penalty that encourages the estimated centroids for observations in the same cluster to be identical. These approaches are not resistant to adversarial samples. This package implemented a robust convex clustering algorithm, which performs well in cases with arbitrary outliers.
References:
Liu C, Sun Q, and Tan KM. Robust convex clustering: How does fusion penalty enhance robustness? preprint.
[R and Python Package] This R package/Python package realizes the adaptive capped least squares for linear regression models with possible outliers. ACLS is robust against outliers in both the predictor and response space: it achieves optimal breakdown point asymptotically. Statistically, it also achieves fully efficient regression estimators.
References:
Sun Q, Mao R, and Zhou WX. Adaptive capped least squares, preprint.
Auto-ARR: Auto-adaptive robust regression
[Python Package] This Python package implements the algorithm by Sun, 2021.
References:
Sun Q. Do we need to estimate the variance in robust mean estimation?, preprint.
[R Package] This R package realizes the I-LAMM algorithm and uses it to solve regularized adaptive Huber regression. The choice of penalty functions includes the l1-norm, the smoothly clipped absolute deviation (SCAD) and the minimax concave penalty (MCP). Two tuning parameters lambda and tau (for adaptive Huber loss) are calibrated by cross-validation. As a by-product, this package also produces regularized least squares estimators, including the Lasso, SCAD and MCP.
References:
Sun Q, Zhou WX, and Fan J (2020). Adaptive Huber regression, Journal of the American Statistical Association, 115, 254–265.
Fan J, Liu H, Sun Q, and Zhang T (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error, The Annals of Statistics, 46, 814-841.
Pan X, Sun Q, and Zhou WX. Iteratively reweighted l1-penalized robust regression, EJS, in press.
[Python Package] This Python package implements the Huber mean estimator, adaptive Huber regression and $\ell_1$-regularized Huber regression (Huber-Lasso) estimators efficiently. For all these methods, the robustification parameter tau is calibrated by a tuning-free principle.
[R Package] This R package implements the Huber mean estimator, adaptive Huber regression and $\ell_1$-regularized Huber regression (Huber-Lasso) estimators efficiently. For all these methods, the robustification parameter tau is calibrated by a tuning-free principle.
References:
Sun Q, Zhou WX, and Fan J (2020). Adaptive Huber regression, Journal of the American Statistical Association, 115, 254–265.
Wang L, Zheng C, Zhou W, Zhou WX. A new principle for tuning-free Huber regression, Statistica Sinica, in press.
Ke Y, Minsker S, Ren Z, Sun Q, and Zhou WX (2019). User-friendly covariance estimation for heavy-tailed distributions, Statistical Science, 34, 454-471.
[R Package] This R package performs robust and large-scale multiple testing for millions of possibly dependent tests. The dependence is specified using a latent factor model. It implements a robust procedure to estimate distribution parameters using the adaptive Huber loss (Sun et al., 2020) and accounts for strong dependence among test statistics via an approximate latent factor model (Fan et al., 2019). This method is tailored to heavy-tailed data whose distributions deviate far from Gaussian. Besides hypotheses testing, the software also outputs the estimated underlying factors and diagnostic plots.
References:
Sun Q, Zhou WX, and Fan J (2020). Adaptive Huber regression, Journal of the American Statistical Association, 115, 254–265.
Fan J, Ke Y, Sun Q, and Zhou WX (2019). FarmTest: Factor-adjusted robust multiple testing with false discovery control, Journal of the American Statistical Association, 114, 1880–1893.
[R Package] This R package utilizes an orthogonality constrained optimization algorithm in Wei and Yin (2013) to solve various semiparametric dimension reduction problems, such as those in Ma and Zhu (2012) and Sun et al. (2018). It also serves as a general-purpose R-based optimization solver for problems with orthogonality constraints. Parallel computing is enabled through the OpenMP API.
References:
Sun Q, Zhu R, Wang T, and Zeng D (2018). Counting process based dimension reduction methods for censored outcomes, Biometrika, 106, 181-196.
Zhu R, et al. (2019). orthoDr: Semiparametric dimension reduction via orthogonality constrained optimization, The R Journal, 11, 24-37.