Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, and Vikash Kumar
Two D"Kitties co-ordinate to push a heavy block to the target (marked by + signs on the floor)
What's the Scoop?
What's the Scoop?
We have developed Hierarchical Sim2Real, a paradigm for learning complex manipulation via locomotion behaviors on real-world robots.
We have developed Hierarchical Sim2Real, a paradigm for learning complex manipulation via locomotion behaviors on real-world robots.
I'm in a hurry - just show me the videos!
I'm in a hurry - just show me the videos!
The Details
The Details
Hierarchical Sim2Real is a general paradigm for modular transfer of learned behaviors to the real-world.
Hierarchical Sim2Real is a general paradigm for modular transfer of learned behaviors to the real-world.
One first trains a low-level goal-reaching policy in simulation. Appropriate domain randomizations are applied (e.g., random terrain as seen below). The learned policy can successfully transfer to real-world environments in a zero-shot fashion.
One first trains a low-level goal-reaching policy in simulation. Appropriate domain randomizations are applied (e.g., random terrain as seen below). The learned policy can successfully transfer to real-world environments in a zero-shot fashion.
A high-level goal-proposing policy is then learned on top of it to solve a more complex task. Training again happens in simulation with appropriate domain randomizations. Crucially, due to the hierarchical nature, the domain randomizations are different and often simpler than they are during low-level training (e.g., simple random action noise as shown below). The learned policy is transfered to real-world environments in a zero-shot fashion.
A high-level goal-proposing policy is then learned on top of it to solve a more complex task. Training again happens in simulation with appropriate domain randomizations. Crucially, due to the hierarchical nature, the domain randomizations are different and often simpler than they are during low-level training (e.g., simple random action noise as shown below). The learned policy is transfered to real-world environments in a zero-shot fashion.