HYDRA: Hybrid Robot Actions for Imitation Learning

Suneel Belkhale, Yuchen Cui, Dorsa Sadigh

Paper

Choosing the right action space is critical in imitation learning. HYDRA can dynamically switch between different action abstractions, which reduces state distribution shift at test time. 

HYDRA making coffee

HYDRA making toast

 Imitation Learning (IL) is a sample efficient paradigm for robot learning using expert demonstrations. However, policies learned through IL suffer from state distribution shift at test time, due to compounding errors in action prediction which lead to previously unseen states. Choosing an action representation for the policy that minimizes this distribution shift is critical in imitation learning. Prior work propose using temporal action abstractions to reduce compounding errors, but they often sacrifice policy dexterity or require domain-specific knowledge. 

To address these trade-offs, we introduce HYDRA, a method that leverages a hybrid action space with two levels of action abstractions: sparse high-level waypoints and dense low-level actions

HYDRA dynamically switches between action abstractions at test time to enable both coarse and fine-grained control of a robot. In addition, HYDRA employs action relabeling to increase the consistency of actions in the dataset, further reducing distribution shift. HYDRA outperforms prior imitation learning methods by 30-40% on seven challenging simulation and real world environments, involving long-horizon tasks in the real world like making coffee and toasting bread.

HYDRA learns a hybrid action representation from demonstration data with additional mode labels. The history conditioned policy learns to output not just a low-level action, but also a waypoint and the current mode. The action for each step is chosen based on the mode. If m=0, a waypoint reaching behavior is followed, but if m=1, the low-level action is followed for a single step. HYDRA helps the policy stay in distribution by increasing action consistency and optimality in the dataset. At test time, HYDRA dynamically switches between waypoint reaching and dense actions, as shown on the right.

Mode Labeling and Waypoint Extraction

HYDRA uses human-provided mode labels to extract waypoints and dynamically switch between modes at test time. Above, we see an example of how we can automatically relabel our dataset to be more consistent and optimal by following waypoint-reaching motions. Through a button interface, we can provide single "clicks" for waypoints and sustained clicks for dense periods. Then we can label states with the desired waypoint as the next single click state (during sparse periods), and keep dense periods the same. We can optionally also relabel actions during sparse periods to point towards the desired future waypoint, as shown on the right with white arrows.

Experiments

We evaluate HYDRA on 7 challenging environments spanning many different object affordances, with three in simulation on robosuite tasks, and three in the real world. All environments use a Franka Emika Panda arm and actions and waypoints are represented in full 6 DOF.

Sample Videos

MakeCoffee Task

Sample rollouts of HYDRA and baselines

make-coffee-ep047.mp4

Additional rollouts of HYDRA (with visual distractions)

make-coffee-small.mp4

MakeToast Task

Sample rollouts of HYDRA and baselines

make-toast-ep045.mp4

Additional rollouts of HYDRA

make-toast-small.mp4

BibTex:

@inproceedings{belkhale2023hydra,

 title={HYDRA: Hybrid Robot Actions for Imitation Learning},

 author={Belkhale, Suneel and Cui, Yuchen and Sadigh, Dorsa},

 booktitle={Proceedings of the 7th Conference on Robot Learning (CoRL)},

 year={2023}

}