Mastering Stacking of Diverse Shapes with Large-Scale Iterative RL on Real Robots

Thomas Lampe*, Abbas Abdolmaleki*, Sarah Bechtle*, Sandy H. Huang*, Jost Tobias Springenberg*,
Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Markus Wulfmeier, Jingwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

(* joint first authorship)

Reinforcement learning from own experience is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and their embedding in an iterative online / offline scheme ("collect and infer") can drastically improve data-efficiency by using all the collected experience, which empowers learning from real robot experience only. Moreover, the resulting policy improves significantly over the state of the art on a recently proposed real robot manipulation benchmark. Our approach learns end-to-end, directly from pixels, and does not rely on substantial additional human domain knowledge such as a simulator or demonstrations.


Hyperparameters

Architecture hyperparameters of the ResNet used to represent the critic and policy.