Contrastive Example-Based Control
Kyle Hatch Benjamin Eysenbach Rafael Rafailov Tianhe Yu
Ruslan Salakhutdinov Sergey Levine Chelsea Finn
Learning for Dynamics and Control (L4DC) 2023
Kyle Hatch Benjamin Eysenbach Rafael Rafailov Tianhe Yu
Ruslan Salakhutdinov Sergey Levine Chelsea Finn
Learning for Dynamics and Control (L4DC) 2023
Abstract
While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.
Benchmark Tasks
LAEO learns to solve tasks from the Fetch Environments [1] and the Metaworld [2] Benchmark using only success examples and unlabeled offline data.
Fetch-Reach
Fetch-Push
Sawyer-Window Open
Sawyer-Drawer Close
Multitask Critic
A critic network trained only on data from a drawer closing task learns a general dynamics model that can be used to solve a variety of different tasks.
Close
Open
Half-closed
Reach (near)
Reach (medium)
Reach (far)
[1] Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Peter Welinder, et al. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
[2] Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pages 1094–1100. PMLR, 2020a.