Latent Plans for Task-Agnostic Offline Reinforcement Learning

Task-AgnostiC Offline Reinforcement Learning

(TACO-RL).

Generalist agent
Learns entirely from offline data
Uses visual observations
Can perform long-horizon tasks
Hierarchical self-supervised approach

Hierarchical policy learning involves learning a hierarchy of policies where a low-level policy performs motor control actions and a high-level policy directs the low-level policy to solve a task. we extend a short-horizon latent skill based policy (Play-LMP), trained with imitation learning, to temporally extended horizons by combining it with offline reinforcement learning (CQL) as a high level policy. Unlike prior work, we learn completely from offline, diverse and unstructured data, no expert trajectories need to be collected and no environment resets need to be made, making our approach very scalable.

Experiments

We compared our model against state of the art baselines

Play-LMP: Imitation learning baseline; Imitates through latent behaviours how to reach a goal image.
CQL+HER: Offline reinforcement learning baseline; Uses the goal sampling approach as us, but the actor needs to take isolated decisions every timestep.

Single task

We evaluate our approach to perform single tasks where the goal image does not contain the end effector performing the action.

TACORL_SINGLE.mp4

Ours

LMP_SINGLE.mp4

Play-LMP

CQL_SINGLE.mp4

CQL+HER

2 Sequential tasks

We also conduct experiments in which the goal image indicates that the robot must perform two tasks sequentially. Both tasks must be deduced by the agent from a single image.

TACORL_TWO.mp4

TACORL_TWO_EX2.mp4

Ours

LMP_TWO.mp4

LMP_TWO_EX2.mp4

Play-LMP

CQL_TWO.mp4

CQL_TWO_EX2.mp4

CQL+HER

5 sequential tasks

To test our model's ability to chain tasks together, we instruct our robot to perform 5 tasks sequentially using challenging intermediate goal images.

TACORL_FIVE.mp4

Ours

LMP_FIVE.mp4

Play-LMP

CQL_FIVE.mp4

CQL+HER

Real-World Experiments

We further validate our approach by performing real-world rollouts using our approach TACO-RL.

taco_rl_rollouts_v3.mp4

Page updated

Google Sites

Report abuse