Task-AgnostiC Offline Reinforcement Learning
(TACO-RL).
Generalist agent
Learns entirely from offline data
Uses visual observations
Can perform long-horizon tasks
Hierarchical self-supervised approach
Hierarchical policy learning involves learning a hierarchy of policies where a low-level policy performs motor control actions and a high-level policy directs the low-level policy to solve a task. we extend a short-horizon latent skill based policy (Play-LMP), trained with imitation learning, to temporally extended horizons by combining it with offline reinforcement learning (CQL) as a high level policy. Unlike prior work, we learn completely from offline, diverse and unstructured data, no expert trajectories need to be collected and no environment resets need to be made, making our approach very scalable.
We compared our model against state of the art baselines
Play-LMP: Imitation learning baseline; Imitates through latent behaviours how to reach a goal image.
CQL+HER: Offline reinforcement learning baseline; Uses the goal sampling approach as us, but the actor needs to take isolated decisions every timestep.
We evaluate our approach to perform single tasks where the goal image does not contain the end effector performing the action.
We also conduct experiments in which the goal image indicates that the robot must perform two tasks sequentially. Both tasks must be deduced by the agent from a single image.
To test our model's ability to chain tasks together, we instruct our robot to perform 5 tasks sequentially using challenging intermediate goal images.
We further validate our approach by performing real-world rollouts using our approach TACO-RL.