I no longer have time to update this list. Please refer to Google Scholar for our most recent publications.
All codes/packages are available at NeXAIS (AGI × Statistics), unless specified otherwise.
Please feel free to email me at qsunstats@gmail.com if you have any questions or comments. If you find any of the codes helpful, please kindly consider to cite our paper :)
I am a strong advocate for openreview: https://openreview.net/.
I publish under the the name of Qiang Sun.
As an open-source effort, I am collecting all review reports, positive or negative, for my papers here.
- Articles distinguished by "with ..." have alphabetical author lists or co-first/corresponding authors.
- Authors are listed as last name first + first initial last.
Lin XH, et al. (2025). Statistics and AI: A fireside conversation, Harvard Data Science Review, 7(2).
- This is an outreach article based on A fireside chat about Stats and AI on 03/17.
- A commentary article by Professor Xiaoli Meng: What’s a Healthy Distance Between “Yes, We Can” and “No, You Can’t—Or Shouldn’t".
Sun Q and Zhu HT. The future of the statistics discipline, In Chinese.
Sun Q. Possible future research directions in Statistics, In Chinese.
Sun Q with Fan J, Zhou WX, and Zhu Z (2018). Principal component analysis for big data, Wiley StatsRef: Statistics Reference Online, 1-13.
Zeng Y, Zhang G, Chen HC, and Sun Q. Multidimensional scaling with noisy data, preprint.
- I no longer have time to update this list. Please refer to Google Scholar for our most recent publications.
Wu M and Sun Q (2025). Ensemble linear interpolators: The role of ensembling, SIAM Journal on Mathematics of Data Science, 7, 438-467. | arXiv
- Bagging achieves a stabler performance in weak signal-to-noise ratio (SNR) regimes while being consistent in large sample regimes. I refer to this ability as "algorithmic adaptivity" to both strong and weak SNR regimes. This notion of adaptivity holds the promise to explain "why do some algorithms always outperform other seemingly optimal algorithms?"
Jiang B, Sun Q, and Fan J (2026). Bernstein's inequality for general Markov chains, Annales de l'Institut Henri Poincaré, in press. | arXiv
- This is the first sharp Bernstein's inequality for general Markov chains.
- This paper and our paper on Hoeffding's inequality for general Markov chains represent our effort on studying measure concentration for dependent (Markovian) data.
- Here is a long story about this paper: Originally submitted to Electronic Journal of Probability (EJP) on August 7, 2020, this paper has been subjected to three separate rounds of peer review over the ensuing three years. The decisions at each stage were as follows: an initial reject and resubmit, a subsequent major revision, and, most recently a final reject.
Sun Q with Fan J, and Jiang B (2021). Hoeffding's inequality for general Markov chains and its applications to statistical learning, Journal of Machine Learning Research, 22, 1-35. | arXiv
- This is the first sharp Hoeffding's inequality for general Markov chains.
- Errata: Lemma 11 in the paper currently only works for reversible chains and thus Theorem 3 in the paper only works for reversible chains. The rest of the proof remains unchanged.
Zhai Z, Zhang J, Wang H, Wu M, Yang K, Qiao X, Sun Q (2026). Rethinking softmax in incremental learning, Neural Networks, 193, 108017.
Liu SF, Luo SK, Ma YH, Zheng XD, and Sun Q. Feature-subsampled and shared-embedding ensemble networks for uplift modeling. | arXiv
- This paper continues our investigation of ensembling methods, now in the context of neural networks. Specifically, we find that sample-subsampling ensembling, aka bootstrap or bagging, performs well for parametric models but fails to yield benefits for neural networks, likely due to the compounded randomness from both sample subsampling and random initialization. In contrast, feature-subsampling ensembling proves highly effective. This is somewhat surprising, as sample and feature subsampling are equivalent in parametric models (at least in linear settings).
Sun Q. Self-tuned robust mean estimators. | arXiv
- The objective function proposed in this paper was referred to as the Sun-Huber objective by later works; see Holland (2023).
- Formerly titled "Do we need to estimate the variance in robust mean estimation?".
- A broader insight from this work is the importance of designing estimators that achieve both optimal non-asymptotic performance, in terms of rate convergence, and optimal asymptotic performance, in terms of statistical efficiency. Achieving strong performance in both settings is what we refer to as a form of adaptivity: adaptivity to both finite-sample and asymptotic regimes. A counterexample is the class of median-of-means (MoM) estimators, which often achieve optimal non-asymptotic guarantees but tend to perform poorly in practice. This limitation arises precisely because MoM lacks this adaptivity: It is not statistical efficient in the large sample regime.
- Github: automean
Wang RY, Wang S, Zuo XX, and Sun Q. Lifelong learning with task-specific adaptation: Addressing the stability-plasticity dilemma. | arXiv
- Github: AdaLL
- Wang RY was an undergraduate student at the University of Toronto when the bulk of this work was done.
- What is the fundamental assumption guarantees the potential success of continual learning? We believe it is the assumption that most tasks share the a large portion of commonality while their differences are small. Therefore, we propose a two-block stucture architecture to model this. Specifically, the backbone models the shared module while the small adapters model the task-specific information.
Cao M, Tang H, Zhao H, Guo H, Liu J, Zhang G, Liu R, Sun Q, Reid I, Liang X. PhysGame: Uncovering physical commonsense violations in gameplay videos, CV2 workshop @ CVPR 2025. | arXiv
- Github: PhysGame.
Yu QY, Baek E, Li X, Sun Q (2025). Corruption-robust variance-aware algorithms for generalized linear bandits under heavy-tailed rewards, UAI 2025. | arXiv
- Yu QY was an undergraduate student in our group.
Yu F, Chen Y, Wei J, Mao J, Li W, Sun Q (2025). UltraTWD: Optimizing ultrametric trees for tree-Wasserstein distance, ICML 2025. | arXiv
Sun Q with Tang FL, et al. (2025). Intervening anchor token: Decoding strategy in alleviating hallucinations for MLLMs, ICLR 2025. | arXiv
Sun Q, Zhang A, Liu C, and Tan KM (2025). Resistant convex clustering: How does fusion penalty enhance robustness?, Electronic Journal of Statistics, 19, 1199–1230. | arXiv
- R package: Rcvxclustr
Chen HC and Sun Q. Decentralized online Riemannian optimization with dynamic environments. | arXiv
Sun Q with Su BX, Yang XC, Zhao BX. The exact risks of reference panel-based regularized estimators, Journal of the American Statistical Association, revision. | arXiv
- 2024 ASA SLDS student paper award.
Yang SG and Sun Q. Online generalized sparse regression: How does overparametrization help?, Journal of Machine Learning Research, revision. | arXiv
Sun Q with Fang XH, Li J, Wang BY (2024). Rethinking the uniformity metric in self-supervised learning, ICLR 2024.
- Github: WassersteinSSL.
Chen HC, Chen X, Elmasri M, and Sun Q (2024). Gradient descent in matrix factorization: Understanding large initialization, UAI 2024. | arXiv
Zhai Z, Chen H, and Sun Q (2024). Quadratic matrix factorization with applications to manifold learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 6384 - 6401. | arXiv
- Github: QMF
Yang R, Yang YL, Zhou F, and Sun Q (2023). Directional diffusion models for graph representation learning, NeurIPS 2023. | arXiv
- Github: DDM
Chen X, Zeng YC, Yang SY, and Sun Q (2023). Sketched ridgeless linear regression: The role of downsampling, ICML 2023. | arXiv
- Github: SRLR
Li X, and Sun Q (2023). Variance-aware robust decision making with linear function approximation under heavy-tailed awards, TMLR, Invited to present at ICLR 2025. | arXiv.
- This paper is the first to come up with robust bandit and RL algorithms with tight variance-aware (instance-dependent) regrets.
Sun Q, Mao R, and Zhou WX. Adaptive capped least squares, arXiv.
- Github: ACLS-Python
- Github: ACLS-R
Neuman M, Xie Y, and Sun Q (2023). Restricted Riemannian geometry for positive semidefinite matrices, Linear Algebra and its Applications, 665, 153-195. arXiv
- This paper is motivated by the paper below. Our earlier idea on the full rank case appeared in Lin 2019; see the acknowledgement therein.
Lin Z, Kong D, and Sun Q (2017). Modeling symmetric positive definite matrices with an application to functional brain connectivity. | arXiv