Latent Action Priors from a Single Gait Cycle Demonstration
Oliver Hausdörfer, Alexander von Rohr, Eric Lefort, Angela P. Schoellig
Paper [arXiv]   Code [GitHub]


We propose to learn a latent action representation from expert demonstrations and subsequently use them as a prior in Deep Reinforcement Learning for locomotion tasks. The latent action priors can be learned from a single gait cycle of expert demonstration consisting of only a few datapoints (5-106 frames). The expert data can be generated by an open-loop controller. Learning from such un-diverse data is typically difficult for imitation learning. We show that combining our latent action priors with style rewards is particularly useful for imitating the expert.

Fig. 1: Method. We learn a latent action representation using a simple autoencoder from a single gait cycle of expert demonstrations. These actions are used as a prior in deep reinforcement learning (DRL) via the decoder. During DRL training, only the policy is optimized, not the latent actions decoder. We combine our approach with style rewards for imitation.

One gait cycle demonstration

              Baseline PPO              

      PPO+latent action prior      

PPO+latent action prior+style

Transfer tasks

For the following transfer tasks, we use the same gait cycle of expert demonstration as above. Interestingly, for 4x target speed we observe a gait transition to a galloping gait.

    2x target speed  

   3x target speed   

   4x target speed   

Any target direction

Other Environments

We use the following gait cycles of expert demonstrations for Half-Cheetah, Ant, Humanoid, and Unitree H1.

Results after Deep Reinforcement Learning (PPO) with latent action priors and style rewards. For Humanoid we use only the latent action priors.

Two Unitree A1 task

Two Unitree A1's jointly need to solve the task and transport the rod to the target location. The task is solved once the rod is within 0.1m of the target, and the target is randomly sampled every episode. Only PPO+latent action prior+style solves the task, which shows that the prior information enables solving of new tasks. The same single gait cycle demonstration for the Unitree A1 as above is used.

              Baseline PPO              

PPO+latent action prior+style

Please refer to the paper for full results. Cite the project as: