I no longer have time to update this list. Please refer to Google Scholar for our most recent publications.
All codes/packages are available at NeXAIS (AGI × Statistics), unless specified otherwise.
Please feel free to email me at qsunstats@gmail.com if you have any questions or comments. If you find any of the codes helpful, please kindly consider to cite our paper :)
I am a strong advocate for openreview: https://openreview.net/.
I publish under the the name of Qiang Sun.
As an open-source effort, I am collecting all review reports, positive or negative, for my papers here.
- Articles distinguished by "with ..." have alphabetical author lists or co-first/corresponding authors.
- Authors are listed as last name first + first initial last.
Lin XH, et al. (2025). Statistics and AI: A fireside conversation, Harvard Data Science Review, 7(2).
- This is an outreach article based on A fireside chat about Stats and AI on 03/17.
- A commentary article by Professor Xiaoli Meng: What’s a Healthy Distance Between “Yes, We Can” and “No, You Can’t—Or Shouldn’t".
Sun Q and Zhu HT. The future of the statistics discipline, In Chinese.
Sun Q. Possible future research directions in Statistics, In Chinese.
Sun Q with Fan J, Zhou WX, and Zhu Z (2018). Principal component analysis for big data, Wiley StatsRef: Statistics Reference Online, 1-13.
Zeng Y, Zhang G, Chen HC, and Sun Q. Multidimensional scaling with noisy data, preprint.
- I no longer have time to update this list. Please refer to Google Scholar for our most recent publications.
Wu M and Sun Q (2025). Ensemble linear interpolators: The role of ensembling, SIAM Journal on Mathematics of Data Science, 7, 438-467. | arXiv
- Bagging achieves a stabler performance in weak signal-to-noise ratio (SNR) regimes while being consistent in large sample regimes. I refer to this ability as "algorithmic adaptivity" to both strong and weak SNR regimes. This notion of adaptivity holds the promise to explain "why do some algorithms always outperform other seemingly optimal algorithms?"
Zhai Z, Zhang J, Wang H, Wu M, Yang K, Qiao X, Sun Q (2026). Rethinking softmax in incremental learning, Neural Networks, 193, 108017.
Liu SF, Luo SK, Ma YH, Zheng XD, and Sun Q. Feature-subsampled and shared-embedding ensemble networks for uplift modeling. | arXiv
- This paper continues our investigation of ensembling methods, now in the context of neural networks. Specifically, we find that sample-subsampling ensembling, aka bootstrap or bagging, performs well for parametric models but fails to yield benefits for neural networks, likely due to the compounded randomness from both sample subsampling and random initialization. In contrast, feature-subsampling ensembling proves highly effective. This is somewhat surprising, as sample and feature subsampling are equivalent in parametric models (at least in linear settings).
Sun Q. Self-tuned robust mean estimators. | arXiv
- The objective function proposed in this paper was referred to as the Sun-Huber objective by later works; see Holland (2023).
- Formerly titled "Do we need to estimate the variance in robust mean estimation?".
- A broader insight from this work is the importance of designing estimators that achieve both optimal non-asymptotic performance, in terms of rate convergence, and optimal asymptotic performance, in terms of statistical efficiency. Achieving strong performance in both settings is what we refer to as a form of adaptivity: adaptivity to both finite-sample and asymptotic regimes. A counterexample is the class of median-of-means (MoM) estimators, which often achieve optimal non-asymptotic guarantees but tend to perform poorly in practice. This limitation arises precisely because MoM lacks this adaptivity: It is not statistical efficient in the large sample regime.
- Github: automean
Wang RY, Wang S, Zuo XX, and Sun Q. Lifelong learning with task-specific adaptation: Addressing the stability-plasticity dilemma. | arXiv
- Github: AdaLL
- Wang RY was an undergraduate student at the University of Toronto when the bulk of this work was done.
- What is the fundamental assumption guarantees the potential success of continual learning? We believe it is the assumption that most tasks share the a large portion of commonality while their differences are small. Therefore, we propose a two-block stucture architecture to model this. Specifically, the backbone models the shared module while the small adapters model the task-specific information.
Cao M, Tang H, Zhao H, Guo H, Liu J, Zhang G, Liu R, Sun Q, Reid I, Liang X. PhysGame: Uncovering physical commonsense violations in gameplay videos. | arXiv
- Github: PhysGame.
- A short version was accepted by the second workshop on computer vision for videogames (CV2) 2025.
Yu QY, Baek E, Li X, Sun Q (2025). Corruption-robust variance-aware algorithms for generalized linear bandits under heavy-tailed rewards, UAI 2025. | arXiv
- Yu QY was an undergraduate student in our group.
Yu F, Chen Y, Wei J, Mao J, Li W, Sun Q (2025). UltraTWD: Optimizing ultrametric trees for tree-Wasserstein distance, ICML 2025. | arXiv
Sun Q with Tang FL, et al. (2025). Intervening anchor token: Decoding strategy in alleviating hallucinations for MLLMs, ICLR 2025. | arXiv
Sun Q, Zhang A, Liu C, and Tan KM (2025). Resistant convex clustering: How does fusion penalty enhance robustness?, Electronic Journal of Statistics, 19, 1199–1230. | arXiv
- R package: Rcvxclustr
Chen GH, Wang XY, Sun Q, and Tang ZZ. (2025). Multidimensional scaling improves distance-based clustering for microbiome data, Bioinformatics, 2, btaf042. | arXiv
Chen HC and Sun Q. Decentralized online Riemannian optimization with dynamic environments. | arXiv
Sun Q with Su BX, Yang XC, Zhao BX. The exact risks of reference panel-based regularized estimators, Journal of the American Statistical Association, revision. | arXiv
- 2024 ASA SLDS student paper award.
Yang SG and Sun Q. Online generalized sparse regression: How does overparametrization help?, Journal of Machine Learning Research, revision. | arXiv
Wu JL, Yang MT, Wang D et al., Sun Q, Li ZX. B vitamin dynamics during pregnancy and the risk of postpartum anemia. | bioRxiv
Github: KAN4PA.
Sun Q with Fang XH, Li J, Wang BY (2024). Rethinking the uniformity metric in self-supervised learning, ICLR 2024.
- Github: WassersteinSSL.
Chen HC, Chen X, Elmasri M, and Sun Q (2024). Gradient descent in matrix factorization: Understanding large initialization, UAI 2024. | arXiv
Zhai Z, Chen H, and Sun Q (2024). Quadratic matrix factorization with applications to manifold learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 6384 - 6401. | arXiv
- Github: QMF
Yang R, Yang YL, Zhou F, and Sun Q (2023). Directional diffusion models for graph representation learning, NeurIPS 2023. | arXiv
- Github: DDM
Chen X, Zeng YC, Yang SY, and Sun Q (2023). Sketched ridgeless linear regression: The role of downsampling, ICML 2023. | arXiv
- Github: SRLR
Li X, and Sun Q (2023). Variance-aware robust decision making with linear function approximation under heavy-tailed awards, TMLR, Invited to present at ICLR 2025. | arXiv.
- This paper is the first to come up with robust bandit and RL algorithms with tight variance-aware (instance-dependent) regrets.
Sun Q, Mao R, and Zhou WX. Adaptive capped least squares, arXiv.
- Github: ACLS-Python
- Github: ACLS-R
Ju Y, Zhang Z, Liu M, Lin S, Sun Q et al (2024). Integrated large-scale metagenome assembly and multi-kingdom network analyses recapitulate sex differences in the human nasal microbiome, Genome Biology, 25, 257.
Tan KM, Sun Q, and Witten D (2023). Sparse reduced rank Huber regression in high dimensions, Journal of the American Statistical Association, 118, 2383-239.
- R Code
Neuman M, Xie Y, and Sun Q (2023). Restricted Riemannian geometry for positive semidefinite matrices, Linear Algebra and its Applications, 665, 153-195. arXiv
- This paper is motivated by the paper below. Our earlier idea on the full rank case appeared in Lin 2019; see the acknowledgement therein.
Lin Z, Kong D, and Sun Q (2017). Modeling symmetric positive definite matrices with an application to functional brain connectivity. | arXiv
Jiang B, Sun Q, and Fan J. Bernstein's inequality for general Markov chains, Annales de l'Institut Henri Poincaré, in press. | arXiv
- This is the first sharp Bernstein's inequality for general Markov chains.
- This paper and our paper on Hoeffding's inequality for general Markov chains represent our effort on studying measure concentration for dependent (Markovian) data.
- Here is a long story about this paper: Originally submitted to Electronic Journal of Probability (EJP) on August 7, 2020, this paper has been subjected to three separate rounds of peer review over the ensuing three years. The decisions at each stage were as follows: an initial reject and resubmit, a subsequent major revision, and, most recently a final reject.
Sun Q and Zhang H (2021) Targeted inference involving high-dimensional data using nuisance penalized regression, Journal of the American Statistical Association 116, 1472-1486.
Sun Q with Fan J, and Jiang B (2021). Hoeffding's inequality for general Markov chains and its applications to statistical learning, Journal of Machine Learning Research, 22, 1-35. | arXiv
- This is the first sharp Hoeffding's inequality for general Markov chains.
Sun Q, Zhou WX, and Fan J (2020). Adaptive Huber regression, Journal of the American Statistical Association, 115, 254–265. | arXiv
- A short commentary article by Eran Raviv about our paper: https://eranraviv.com/adaptive-huber-regression/.
- Github: I-LAMM
- You can also find implementations in the following two packages
- R package: tfHuber, Python package: tfHuber.
Zhu F, Guo R, Wang W, Ju Y, Wang Q, Ma Q, Sun Q et al. (2020). Transplantation of microbiota from drug-free patients with schizophrenia causes schizophrenia-like abnormal behaviors and dysregulated kynurenine metabolism in mice, Nature Molecular Psychiatry, 25, 2905–2918.
Zhu F, Ju Y, Wang W, Wang Q, Guo R, Ma Q, Sun Q et al. (2020). Metagenome-wide association of gut microbiome features for schizophrenia, Nature Communications, 11, 1612.
Jiang B and Sun Q. Bayesian high-dimensional linear regression with generic spike-and-slab priors, preprint. | arXiv
Sun Q with Ke Y, Minsker S, Ren Z, and Zhou WX (2019). User-friendly covariance estimation for heavy-tailed distributions, Statistical Science, 34, 454-471.
- R package: tfHuber
- R package: tfHuber
Sun Q*, Zhu R*, Wang T, and Zeng D (2019). Counting process based dimension reduction methods for censored outcome, Biometrika, 106, 181-196.
- * marks equal contribution.
- R package: orthoDr
Sun Q with Fan J, Ke Y, and Zhou WX (2019). FarmTest: Factor-adjusted robust multiple testing with false discovery control, Journal of the American Statistical Association, 114, 1880–1893.
- R package: FarmTest
Sun Q with Fan J, Liu H, and Zhang T (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error, The Annals of Statistics, 46, 814-841.
- R package: I-LAMM
Sun Q, Zhu HT, Liu YF, Ibrahim JG (2015). SPReM: Sparse projection regression model for high-dimensional linear regression, Journal of the American Statistical Association, 110, 289-302.