Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context. Xiang Cheng, Yuxin Chen, Suvrit Sra

Transformers learn to implement preconditioned gradient descent for in-context learning. Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit Sra [NeurIPS 2023]

Linear attention is (maybe) all you need (to understand transformer optimization). Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jababaie, Suvrit Sra [preprint 2023]

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions. Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu [NeurIPS 2023]

Restart Sampling for Improving Generative Processes. Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, Tommi Jaakkola [NeurIPS 2023]

Efficient Sampling on Riemannian Manifolds via Langevin MCMC. Xiang Cheng, Jingzhao Zhang, Suvrit Sra [NeurIPS 2022 (oral)]

Theory and Algorithms for Diffusion Processes on Riemannian Manifolds. Xiang Cheng, Jingzhao Zhang, Suvrit Sra [preprint 2022]

Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet [COLT 2021]

Is There an Analog of Nesterov Acceleration for MCMC? Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, Michael I. Jordan [Bernoulli 2021]

Stochastic Gradient and Langevin Processes Xiang Cheng, Yin Dong, Peter L. Bartlett, Michael I. Jordan [ICML 2020]

Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting. Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael I. Jordan [preprint 2018]

Underdamped Langevin MCMC: A non-asymptotic analysis. Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan [COLT 2018]

Convergence of Langevin MCMC in KL-divergence. Xiang Cheng, Peter L. Bartlett [ALT 2018]

Variational perspective on local graph clustering. Kimon Fountoulakis, Farbod Roosta-Khorasani, Julian Shun, Xiang Cheng, Michael W. Mahoney [Math. Program. 174(1-2): 553-573 (2019)]

FLAG n’ FLARE: Fast Linearly-Coupled Adaptive Gradient Methods. Xiang Cheng, Fred (Farbod) Roosta, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney  [AISTATS 2018]

Asymptotic behavior of £p-based Laplacian regularization in semi-supervised learning. Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin  J.  Wainwright,  Michael  I.  Jordan [COLT 2016]