"Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning" - Aneesh Muppidi, Zhiyu Zhang, Heng Yang
"Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo" - Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli
"Can large language models explore in-context?" - Akshay Krishnamurthy, Keegan Harris, Dylan J Foster, Cyril Zhang, Aleksandrs Slivkins
"Adaptive Exploration for Data-Efficient General Value Function Evaluations" - Arushi Jain, Josiah P. Hanna, Doina Precup
"Pick up the PACE: A Parameter-Free Optimizer for Lifelong Reinforcement Learning" - Aneesh Muppidi, Zhiyu Zhang, Heng Yang
"Towards Zero-Shot Generalization in Offline Reinforcement Learning" - Zhiyong Wang, Chen Yang, John C.S. Lui, Dongruo Zhou
"Variance-Dependent Regret Bounds for Non-stationary Linear Bandits" - Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, Dongruo Zhou
"Optimistic Q-learning for average reward and episodic settings" - Priyank Agrawal, Shipra Agrawal
"Do LLM Agents Have Regret? A Case Study in Online Learning and Games" - Chanwoo Park, Xiangyu Liu, Asuman E. Ozdaglar, Kaiqing Zhang
"Minimax Bounds for Offline Decision Making with Function Approximation" - Thanh Nguyen-Tang, Raman Arora
"Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity" - Philip Amortila, Dylan J Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi
"Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality" - Audrey Huang, Nan Jiang
"Minimum Empirical Divergence for Sub-Gaussian Linear Bandits" - Kapilan Balagopalan, Kwang-Sung Jun
"Haver: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning" - Tuan Nguyen, Kwang-Sung Jun
"A Model Selection Framework for Learning Rate-Free Reinforcement Learning" - Afshar, Aldo Pacchiano
"A theoretical framework for learning history-based policies for Markov Decision Processes" - Gandharv Patil, Aditya Mahajan, Doina Precup
"REBEL: Reinforcement Learning via Regressing Relative Rewards" - Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
"Bandits with Preference Feedback: A Stackelberg Game Perspective" - Barna Pásztor, Parnian Kassraie, Andreas Krause
"Multi-Agent Imitation Learning: Value is Easy, Regret is Hard" - Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu
"Understanding Preference Learning Through the Lens of Coverage" - Yuda Song, Gokul Swamy, Aarti Singh, Drew Bagnell, Wen Sun
"Efficient Inverse Reinforcement Learning without Compounding Errors" - Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun
"ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization" - Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, Pulkit Agrawal
"Functional Acceleration for Policy Mirror Descent" - Veronica Chelu, Doina Precup
"Active Preference Optimization for Sample Efficient RLHF" - Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury
"Transfer Q-star: Principled Decoding for LLM Alignment" - Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Bedi, Furong Huang