Publications

"My soul’s imprint lies in the art I create." - S., 2023

All codes/packages are available at StatsLE, unless specified otherwise.
Please feel free to email me at qsunstats@gmail.com if you have any questions or comments. If you find any of the codes helpful, please kindly consider to cite our paper :)

I am an advocate for openreview: https://openreview.net/.

- Articles distinguished by "with ..." have alphabetical author lists or co-first authors, as is the convention in math stats, stats learning theory and theoretical computer science.

- Authors are listed as last name first + first initial last.

Overviews

Sun Q with Fan J, Zhou WX and Zhu Z (2018). Principal component analysis for big data, Wiley StatsRef: Statistics Reference Online, 1-13.

Selected Recent Papers

For the complete list, please see Google Scholar.

Yang SG and Sun Q (2024). Online generalized sparse regression: How does overparametrization help?
Wu JL, Yang MT, Wang D et al., Sun Q, Li ZX (2024). B vitamin dynamics during pregnancy and the risk of postpartum anemia, Nature Communications, under review.

Sun Q with Fang XH, Li J, Wang BY (2024). Rethinking the uniformity metric in self-supervised learning, ICLR 2024.
- Python Code: WassersteinSSL.

Sun Q with Su BX, Yang XC, Zhao BX. The exact risks of reference panel-based regularized estimators, [arXiv].

- 2024 ASA SLDS student paper award.

Wu M and Sun Q. Ensemble linear interpolators: The role of ensembling, [arXiv].

- Bagging achieves a stabler performance in weak signal-to-noise ratio regimes while being consistent in large sample regimes. I refer to this ability as "algorithmic adaptivity". This notion of adaptivity holds the promise to explain "why do some algorithms always outperform other seemingly optimal algorithms?"

Chen HC, Chen X, Elmasri M, and Sun Q (2024). Gradient descent in matrix factorization: Understanding large initialization, UAI 2024. arXiv | TLDR.

Sun Q (2023). Do we need to estimate the variance in robust mean estimation? (Self-tuned mean estiamtors), [arXiv].

- Python code "automean"

- The objective function proposed in this paper was referred to as the Sun-Huber objective by later works; see Holland (2023).

- Reviews: Open reviews, and a private one by Lee and Valiant.

Yang R, Yang YL, Zhou F, and Sun Q (2023). Directional diffusion models for graph representation learning, NeurIPS 2023, [arXiv]

- Python code "DDM"

Chen X, Zeng YC, Yang SY, and Sun Q (2023). Sketched ridgeless linear regression: The role of downsampling, ICML 2023.

- Python package "SRLR"

Li X, and Sun Q. Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed awards, TMLR, [arXiv].

- This paper is the first to come up with robust bandit and RL algorithms with tight variance-aware (instance-dependent) regrets.

Zhai Z, Chen H, and Sun Q. Quadratic matrix factorization with applications to manifold learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, [arXiv].

- Matlab code "QMF"

Sun Q, Mao R, and Zhou WX. Adaptive capped least squares, [arXiv].

- Python package "ACLS "

- R package "ACLS"

Neuman M, Xie Y, and Sun Q (2023). Restricted Riemannian geometry for positive semidefinite matrices, Linear Algebra and its Applications, 665, 153-195. [arXiv].

- This paper is motivated by the paper below. Our earlier idea on the full rank case appeared in Lin 2019; see the acknowledgement therein.

Lin Z, Kong D, and Sun Q (2017). Modeling symmetric positive definite matrices with an application to functional brain connectivity, [arXiv].

Little A, Xie Y, and Sun Q (2023). An analysis of classical multidimensional scaling with applications to clustering, Information and Inference, 12, 72-112.

- Patel et al. (2023) pointed out that "Little et al. (2022), the first strong theoretical guarantee for CMDS in the literature, studies the performance of CMDS on the task of clustering under sub-Gaussian mixture models. "

Jiang B, Sun Q, and Fan J. Bernstein's inequality for general Markov chains, [arXiv].

- This is the first sharp Bernstein's inequality for general Markov chains.

- Here is a long story about this paper: Originally submitted to Electronic Journal of Probability (EJP) on August 7, 2020, this paper has been subjected to three separate rounds of peer review over the ensuing three years. The decisions at each stage were as follows: an initial reject and resubmit, a subsequent major revision, and, most recently a final reject.

Yu M, Sun Q, and Zhou WX. Low-rank matrix recovery under heavy-tailed errors, Bernoulli, in press.

Ju Y, Zhang Z, Liu M, Lin S, Sun Q et al. Integrated large-scale metagenome assembly and multi-kingdom network analyses recapitulate sex differences in the nasal microbiome, [bioRxiv].

Zhai Z, Chen H, and Sun Q. Bounded projection matrix approximation with applications to community detection, [arXiv], IEEE Signal Processing Letters, in press.

Tan KM, Sun Q, and Witten D (2023). Sparse reduced rank Huber regression in high dimensions, Journal of the American Statistical Association, 118, 2383–239.

- R Code

Sun Q with Fan J, and Jiang B (2021). Hoeffding's inequality for general Markov chains and its applications to statistical learning, Journal of Machine Learning Research, 22, 1-35, [arXiv].

- This is the first sharp Hoeffding's inequality for general Markov chains.

Sun Q, Zhou WX, and Fan J (2020). Adaptive Huber regression, Journal of the American Statistical Association, 115, 254–265. [arXiv].

- A short commentary article by Eran Raviv about our paper: https://eranraviv.com/adaptive-huber-regression/.

- R package "I-LAMM"

- You can also find implementations in the following two packages - R package "tfHuber", Python package "tfHuber".

Zhu F, Guo R, Wang W, Ju Y, Wang Q, Ma Q, Sun Q et al. (2020). Transplantation of microbiota from drug-free patients with schizophrenia causes schizophrenia-like abnormal behaviors and dysregulated kynurenine metabolism in mice, Molecular Psychiatry, 25, 2905–2918.

Zhu F, Ju Y, Wang W, Wang Q, Guo R, Ma Q, Sun Q et al. (2020). Metagenome-wide association of gut microbiome features for schizophrenia, Nature Communications, 11, 1612.

Jiang B and Sun Q. Bayesian high-dimensional linear regression with generic spike-and-slab priors, [arXiv].

Sun Q with Liu C, Tan KM. Robust convex clustering: How does fusion penalty enhance robustness?, [arXiv].

- R package "Rcvxclustr"

Sun Q with Ke Y, Minsker S, Ren Z, and Zhou WX (2019). User-friendly covariance estimation for heavy-tailed distributions, Statistical Science, 34, 454-471.

- R package "tfHuber"

- Python package "tfHuber"

Sun Q, Zhu R, Wang T, and Zeng D (2019). Counting process based dimension reduction methods for censored outcome, Biometrika, 106, 181-196.

- R package "orthoDr"

Sun Q with Fan J, Ke Y, and Zhou WX (2019). FarmTest: Factor-adjusted robust multiple testing with false discovery control, Journal of the American Statistical Association, 114, 1880–1893.

- R package "FarmTest"

Sun Q with Fan J, Liu H, and Zhang T (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error, The Annals of Statistics, 46, 814-841.

- R package "I-LAMM"

Sun Q, Zhu HT, Liu Y, and Ibrahim JG (2015). SPReM: Sparse projection regression model for high-dimensional linear regression, Journal of the American Statistical Association, 110, 289-302.