Learning Robust State Abstractions

for Hidden-Parameter Block MDPs

Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

Accepted at ICLR 2021. Code available here.

Abstract

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better performance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

The Typical Multi-Task Setting with Shared Representations

The HiP-BMDP Setting

Method

Learning a HiP-BMDP approximation of a family of MDPs requires the following components:

  1. an encoder that maps observations from state space to a learned, latent representation, φ : S → Z,

  2. an environment encoder ψ that maps an environment identifier to a hidden parameter θ,

  3. a universal dynamics model T conditioned on task parameter θ.


where red indicates gradients are stopped.



Results

Multi-Task Setting. Top: Performance on the training tasks. Bottom: Zero-shot generalization performance on the extrapolation tasks. We see that our method, HiP-BMDP performs best against all baselines across all environments.


Example Videos

Two policies learned with HiP-BMDP evaluated on environments from Cheetah-Run-V0, Finger-Spin-V0, and Walker-Run-V1, all adapted from Deepmind Control.

env_2_step_950000 (3).mp4
env_9_step_950000 (1).mp4
env_2_step_950000 (2).mp4
env_9_step_950000 (2).mp4