I no longer have time to update this list in a timely manner. Please refer to Google Scholar for our most recent publications.
All codes/packages are available at NeXAIS (AGI × Statistics), unless specified otherwise.
Please feel free to email me at qsunstats@gmail.com if you have any questions or comments. If you find any of the codes helpful, please kindly consider to cite our paper :)
I am a strong advocate for openreview: https://openreview.net/.
I publish under the the name of Qiang Sun.
As an open-source effort, I am collecting all review reports, positive or negative, for my papers here.
- Articles distinguished by "with ..." have alphabetical author lists or co-first/corresponding authors.
- Authors are listed as last name first + first initial last.
Lin XH, et al. (2025). Statistics and AI: A fireside conversation, Harvard Data Science Review, 7(2).
- This is an outreach article based on A fireside chat about Stats and AI on 03/17.
- A commentary article by Professor Xiaoli Meng: What’s a Healthy Distance Between “Yes, We Can” and “No, You Can’t—Or Shouldn’t".
Sun Q and Zhu HT. The future of the statistics discipline, In Chinese.
Sun Q. Possible future research directions in Statistics, In Chinese.
Sun Q with Fan J, Zhou WX, and Zhu Z (2018). Principal component analysis for big data, Wiley StatsRef: Statistics Reference Online, 1-13.
Zeng Y, Zhang G, Chen HC, and Sun Q. Multidimensional scaling with noisy data, preprint.
- I no longer have time to update this list in a timely manner. Please refer to Google Scholar for our most recent publications.
Wu M, Yang AY, and Sun Q. Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting. | arXiv
Ouyang YD, Hu P, Wan ZY, Wang Z, Xie LY, Bespalov D, Wu NY, Cheng G, Zha HY, Sun Q. Training-Free Self-Correction for Multimodal Masked Diffusion Models. | arXiv
Ye KJ, Shi ZH, Wan WL, Zhou YHZ, Yu YH, Zuo XX, Sun Q, Lu JW. CamDirector: Towards Long-Term Coherent Video Trajectory Editing, CVPR 2026. | arXiv
Wang SH, LI YS, Hu BH, Yao ZT, Li ZD, Li LS, Zhan HB, Liu MM, Dong JH, Qian RZ, Wu GX, Zhang H, Shen JF, Koniusz P, Sun Q (2026). DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection, ICLR 2026. | arXiv
Cao M, Tang HR, Zhao HZ, Han MF, Liu RY, Sun Q, Chang XJ, Reid I, Liang XD (2026). Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos, TMLR. | arXiv
Wu M and Sun Q (2025). Ensemble linear interpolators: The role of ensembling, SIAM Journal on Mathematics of Data Science, 7, 438-467. | arXiv
- Bagging achieves a stabler performance in weak signal-to-noise ratio (SNR) regimes while being consistent in large sample regimes. I refer to this ability as "algorithmic adaptivity" to both strong and weak SNR regimes. This notion of adaptivity holds the promise to explain "why do some algorithms always outperform other seemingly optimal algorithms?"
Jiang B, Sun Q, and Fan J (2026). Bernstein's inequality for general Markov chains, Annales de l'Institut Henri Poincaré, in press. | arXiv
- This is the first sharp Bernstein's inequality for general Markov chains.
- This paper and our paper on Hoeffding's inequality for general Markov chains represent our effort on studying measure concentration for dependent (Markovian) data.
Wu M, Yang AY, and Sun Q (2025). PCA++: How uniformity induces robustness to background noise in contrastive learning, NeurIPS. | arXiv | NeurIPS Spotlight Presentation
Liu SF, Luo SK, Ma YH, Zheng XD, and Sun Q. Feature-subsampled and shared-embedding ensemble networks for uplift modeling. | arXiv
- This paper continues our investigation of ensembling methods, now in the context of neural networks. Specifically, we find that sample-subsampling ensembling, aka bootstrap or bagging, performs well for parametric models but fails to yield benefits for neural networks, likely due to the compounded randomness from both sample subsampling and random initialization. In contrast, feature-subsampling ensembling proves highly effective. This is somewhat surprising, as sample and feature subsampling are equivalent in parametric models (at least in linear settings).
Sun Q. Self-tuned robust regression estimators. | arXiv
- The objective function proposed in this paper was referred to as the Sun-Huber objective by later works; see Holland (2023).
- A broader insight from this work is the importance of designing estimators that achieve both optimal non-asymptotic performance, in terms of rate convergence, and optimal asymptotic performance, in terms of statistical efficiency. Achieving strong performance in both settings is what we refer to as a form of adaptivity: adaptivity to both finite-sample and asymptotic regimes. A counterexample is the class of median-of-means (MoM) estimators, which often achieve optimal non-asymptotic guarantees but tend to perform poorly in practice. This limitation arises precisely because MoM lacks this adaptivity: It is not statistical efficient in the large sample regime.
- Github: automean
Wang RY, Wang S, Zuo XX, and Sun Q. Lifelong learning with task-specific adaptation: Addressing the stability-plasticity dilemma. | arXiv
- Github: AdaLL
- Wang RY was an undergraduate student at the University of Toronto when the bulk of this work was done.
- What is the fundamental assumption guarantees the potential success of continual learning? We believe it is the assumption that most tasks share the a large portion of commonality while their differences are small. Therefore, we propose a two-block stucture architecture to model this. Specifically, the backbone models the shared module while the small adapters model the task-specific information.
Cao M, Tang H, Zhao H, Guo H, Liu J, Zhang G, Liu R, Sun Q, Reid I, Liang X. PhysGame: Uncovering physical commonsense violations in gameplay videos, CV2 workshop @ CVPR 2025. | arXiv
- Github: PhysGame.
Yu QY, Baek E, Li X, Sun Q (2025). Corruption-robust variance-aware algorithms for generalized linear bandits under heavy-tailed rewards, UAI 2025. | arXiv
- Yu QY was an undergraduate student in our group.
Yu F, Chen Y, Wei J, Mao J, Li W, Sun Q (2025). UltraTWD: Optimizing ultrametric trees for tree-Wasserstein distance, ICML 2025. | arXiv
Sun Q with Tang FL, et al. (2025). Intervening anchor token: Decoding strategy in alleviating hallucinations for MLLMs, ICLR 2025. | arXiv
Sun Q, Zhang A, Liu C, and Tan KM (2025). Resistant convex clustering: How does fusion penalty enhance robustness?, Electronic Journal of Statistics, 19, 1199–1230. | arXiv
- R package: Rcvxclustr
Chen HC and Sun Q. Decentralized online Riemannian optimization with dynamic environments. | arXiv
Sun Q with Su BX, Yang XC, Zhao BX. The exact risks of reference panel-based regularized estimators. | arXiv
- 2024 ASA SLDS student paper award.
Yang SG and Sun Q. Online generalized sparse regression: How does overparametrization help?, Journal of Machine Learning Research. | arXiv
Sun Q with Fang XH, Li J, Wang BY (2024). Rethinking the uniformity metric in self-supervised learning, ICLR 2024.
- Github: WassersteinSSL.
Chen HC, Chen X, Elmasri M, and Sun Q (2024). Gradient descent in matrix factorization: Understanding large initialization, UAI 2024. | arXiv
Zhai Z, Chen H, and Sun Q (2024). Quadratic matrix factorization with applications to manifold learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 6384 - 6401. | arXiv
- Github: QMF
Yang R, Yang YL, Zhou F, and Sun Q (2023). Directional diffusion models for graph representation learning, NeurIPS 2023. | arXiv
- Github: DDM
Chen X, Zeng YC, Yang SY, and Sun Q (2023). Sketched ridgeless linear regression: The role of downsampling, ICML 2023. | arXiv
- Github: SRLR
Li X, and Sun Q (2024). Variance-aware robust decision making with linear function approximation under heavy-tailed awards, TMLR. | arXiv | J2C Certification (Present at ICLR 2025)
- This paper is the first to come up with robust bandit and RL algorithms with tight variance-aware (instance-dependent) regrets.
Neuman M, Xie Y, and Sun Q (2023). Restricted Riemannian geometry for positive semidefinite matrices, Linear Algebra and its Applications, 665, 153-195. arXiv
- This paper is motivated by the paper below. Our earlier idea on the full rank case appeared in Lin 2019; see the acknowledgement therein.
- Lin Z, Kong D, and Sun Q (2017). Modeling symmetric positive definite matrices with an application to functional brain connectivity. | arXiv
Sun Q with Fan J, and Jiang B (2021). Hoeffding's inequality for general Markov chains and its applications to statistical learning, Journal of Machine Learning Research, 22, 1-35. | arXiv
- This is the first sharp Hoeffding's inequality for general Markov chains.
- Errata: Lemma 11 in the paper currently only works for reversible chains and thus Theorem 3 in the paper only works for reversible chains. The rest of the proof remains unchanged.