Domain Adaptation in Reinforcement Learning Via Latent Unified State Representation

Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar

University of California, Irvine

Introduction

One challenge of applying reinforcement learning (RL) in real world applications is the generalization of RL. Even minor visual changes could make the trained agent completely fail in the new task. To address this issue, we propose a two-stage RL agent that first learns a latent unified state representation (LUSR) which is consistent across multiple domains in the first stage, and then do RL training in one source domain based on LUSR in the second stage. LUSR separates the latent representation of image states into domain-general and domain-specific embeddings and keep the domain-general embedding as the latent state representation. The cross-domain consistency of LUSR allows the policy acquired from the source domain to generalize to other target domains without extra training. The code is release here https://github.com/KarlXing/LUSR.

Architectures of LUSR and Benchmarks

Architectures

LUSR (subfigure c) learns domain-general and domain-specific embeddings in the first step and do reinforcement learning training based on the domain-general embedding in the second step.

Results In CarRacing

Disentanglement of Latent Embedding

The third row of images are reconstructed with the domain-specific embedding from the first row and domain-general embedding from the second row. This shows that domain-general and domain-specific features are well encoded into corresponding embeddings separately.

Domain Adaptation After Training

LUSR has consistent great zero-shot policy transfer performance for domains with various differences.

Domain Adaptation Across Training

LUSR has consistent zero-loss transfer performance across training while other latent embedding approaches either have gradually decreasing transfer performance or fail to generalize from beginning to end.