Preprints

(α-β order) denotes alphabetical ordering, * denotes equal contribution.

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games [arXiv]


Learning a Universal Human Prior for Dexterous Manipulation from Human Preference. [arXiv]

Publications

(α-β order) denotes alphabetical ordering, *,+ denotes equal contribution.

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift [arXiv]


Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning [arXiv]


On the Provable Advantage of Unsupervised Pretraining [arXiv]


V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL [arXiv]


Is RLHF More Difficult than Standard RL? [arXiv]


DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method. [arXiv]


Context-lumpable Stochastic Bandits. [arXiv]


Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL. [arXiv]


Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation [arXiv]


Efficient displacement convex optimization with particle gradient descent [arXiv]


Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making [arXiv]


Learning Rationalizable Equilibria in Multiplayer Games [arXiv]


Faster Federated Optimization under Second-order Similarity [arXiv]


Representation Learning for Low-rank General-sum Markov Games [arXiv]


Provable Sim-to-real Transfer in Continuous Domain with Partial Observations [arXiv]


Sample-Efficient Reinforcement Learning of Partially Observable Markov Games [arXiv]


Efficient Φ-Regret Minimization in Extensive-Form Games via Online Mirror Descent [arXiv]


When Is Partially Observable Reinforcement Learning Not Scary? [arXiv]


Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits [arXiv]


Near-Optimal Learning of Extensive-Form Games with Imperfect Information. [arXiv]


Provable Reinforcement Learning with a Short-Term Memory. [arXiv]


A Simple Reward-free Approach to Constrained Reinforcement Learning [arXiv]


The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces [arXiv]


Understanding Domain Randomization for Sim-to-real Transfer [arXiv]


Minimax Optimization with Smooth Algorithmic Adversaries [arXiv]


Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms [arXiv]


Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [arXiv]


Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning  [arXiv]


A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network [arXiv]


Near-optimal Representation Learning for Linear Bandits and Linear RL [arXiv]


A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [arXiv]


Provable Meta-Learning of Linear Representations [arXiv]


On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points [arXiv]


Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [arXiv]


Near-Optimal Reinforcement Learning with Self-Play [arXiv]


On the Theory of Transfer Learning: The Importance of Task Diversity [arXiv]


On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [arXiv]


Provable Self-Play Algorithms for Competitive Reinforcement Learning [arXiv]


Reward-Free Exploration for Reinforcement Learning [arXiv]


Near-Optimal Algorithms for Minimax Optimization [arXiv]


Learning Adversarial MDPs with Bandit Feedback and Unknown Transition [arXiv]


Provably Efficient Exploration in Policy Optimization [arXiv]


Provably Efficient Reinforcement Learning with Linear Function Approximation [arXiv]


What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? [arXiv]


On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [arXiv]


Sampling Can Be Faster Than Optimization [arXiv]


Is Q-learning Provably Efficient? [arXiv]


On the Local Minima of the Empirical Risk [arXiv]


Stochastic Cubic Regularization for Fast Nonconvex Optimization [arXiv]


Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent [arXiv]


Gradient Descent Can Take Exponential Time to Escape Saddle Points [arXiv]


How to Escape Saddle Points Efficiently [arXiv] [blog]


No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis [arXiv]


Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot [arXiv]


Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences [arXiv]


Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent [arXiv]


Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm [arXiv]


Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis [arXiv]


Faster Eigenvector Computation via Shift-and-Invert Preconditioning [arXiv]


Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition [arXiv]


Differentially Private Data Releasing for Smooth Queries [paper]


Dimensionality Dependent PAC-Bayes Margin Bound [paper]

Technical Notes

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm [arXiv]