Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context. Xiang Cheng, Yuxin Chen, Suvrit Sra
Transformers learn to implement preconditioned gradient descent for in-context learning. Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit Sra [NeurIPS 2023]
Linear attention is (maybe) all you need (to understand transformer optimization). Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jababaie, Suvrit Sra [preprint 2023]
Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions. Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu [NeurIPS 2023]
Restart Sampling for Improving Generative Processes. Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, Tommi Jaakkola [NeurIPS 2023]
Efficient Sampling on Riemannian Manifolds via Langevin MCMC. Xiang Cheng, Jingzhao Zhang, Suvrit Sra [NeurIPS 2022 (oral)]
Theory and Algorithms for Diffusion Processes on Riemannian Manifolds. Xiang Cheng, Jingzhao Zhang, Suvrit Sra [preprint 2022]
Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet [COLT 2021]
Is There an Analog of Nesterov Acceleration for MCMC? Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, Michael I. Jordan [Bernoulli 2021]
Stochastic Gradient and Langevin Processes Xiang Cheng, Yin Dong, Peter L. Bartlett, Michael I. Jordan [ICML 2020]
Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting. Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael I. Jordan [preprint 2018]
Underdamped Langevin MCMC: A non-asymptotic analysis. Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan [COLT 2018]
Convergence of Langevin MCMC in KL-divergence. Xiang Cheng, Peter L. Bartlett [ALT 2018]
Variational perspective on local graph clustering. Kimon Fountoulakis, Farbod Roosta-Khorasani, Julian Shun, Xiang Cheng, Michael W. Mahoney [Math. Program. 174(1-2): 553-573 (2019)]
FLAG n’ FLARE: Fast Linearly-Coupled Adaptive Gradient Methods. Xiang Cheng, Fred (Farbod) Roosta, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney [AISTATS 2018]
Asymptotic behavior of £p-based Laplacian regularization in semi-supervised learning. Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J. Wainwright, Michael I. Jordan [COLT 2016]