Baselines

On this page, we present results obtained for the gap crossing task in our simulation environment using a baseline learning method, Policies Modulating Trajectory Generators (PMTG).

rss_pmtg_gapcrossing_noDR_20cm.mp4

Policies Modulating Trajectory Generators

rss_adaptive_pronk_graphic3.mp4

Our Method

In absence of detailed physical modeling, PMTG tends toward behaviors which exploit the inaccuracies of the simulator, such as irregular contact schedules, dragging of feet, and unrealistically high velocity.


By contrast, our method learns to select the parameters of a well-behaved trajectory family (e.g. Raibert-heuristic gaits with regular contact schedules). Because our low-level controller handles the computation of forces and conversion from cartesian space to joint space, these trajectory families are simple to define and constrain.

rss_pmtg_gapcrossing_noDR_30cm.mp4

PMTG, Trained with 30-Centimeter Gaps

PMTG is notably less efficient at exploring diverse gaits than our approach. Unlike our approach, we found that learning a successful gap crossing policy with PMTG required the introduction of an explicit reward for gap crossing. In addition to providing this reward, we had to decrease the maximum gap size from 30cm to 20cm for PMTG to discover gap crossing. When this reward is withheld, or when the gap size is increased, the robot tends to stop before the first gap as shown above.


Our method learns to perform gap crossing with no gap-specific reward term and no curriculum.