A high-level goal-proposing policy is then learned on top of it to solve a more complex task. Training again happens in simulation with appropriate domain randomizations. Crucially, due to the hierarchical nature, the domain randomizations are different and often simpler than they are during low-level training (e.g., simple random action noise as shown below). The learned policy is transfered to real-world environments in a zero-shot fashion.