We learn a policy from raw visual inputs to trajectory parameters in simulation and achieve near-optimal gap-crossing performance under fixed contact schedules.
We transfer our policy to the real system without domain randomization or detailed environment modeling.
View ten consecutive runs here: Trotting Transfer Experiments
We demonstrate the extension of our learning framework to realistic, highly dynamic leaps across gaps up to 1.5x the quadruped body length.
Our learned policy can succeed in novel environments where low-level controller behavior remains robust.
View analysis of a relevant baseline here: Baselines