Latent Action Priors for Locomotion with Deep Reinforcement Learning
Anonymous Author(s)
Code [TBA] Torque control deployment code [TBA] Paper [TBA]
Question: How to efficiently train DRL policies with minimal expert knowledge?
TL;DR. Latent action priors combined with a single style reward term accelerate locomotion learning without extensive reward tuning.
Abstract. Deep reinforcement learning (DRL) often leads to brittle, unnatural behaviors or requires extensive reward tuning. We propose a latent action space combined with a style reward term to accelerate locomotion learning without reward tuning. Both the latent action space and the imitation rewards are derived from a demonstrations. While the pre-trained latent action spaces alone improve the learned task reward and the sample efficiency for learning, especially the combination with a single imitation reward from the same demonstration is a powerful tool to obtain deployable locomotion behaviors.