Domain Adaptation in Reinforcement Learning Via Latent Unified State Representation
Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar
University of California, Irvine
Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar
University of California, Irvine
One challenge of applying reinforcement learning (RL) in real world applications is the generalization of RL. Even minor visual changes could make the trained agent completely fail in the new task. To address this issue, we propose a two-stage RL agent that first learns a latent unified state representation (LUSR) which is consistent across multiple domains in the first stage, and then do RL training in one source domain based on LUSR in the second stage. LUSR separates the latent representation of image states into domain-general and domain-specific embeddings and keep the domain-general embedding as the latent state representation. The cross-domain consistency of LUSR allows the policy acquired from the source domain to generalize to other target domains without extra training. The code is release here https://github.com/KarlXing/LUSR.
LUSR (subfigure c) learns domain-general and domain-specific embeddings in the first step and do reinforcement learning training based on the domain-general embedding in the second step.
The third row of images are reconstructed with the domain-specific embedding from the first row and domain-general embedding from the second row. This shows that domain-general and domain-specific features are well encoded into corresponding embeddings separately.
LUSR has consistent great zero-shot policy transfer performance for domains with various differences.
LUSR has consistent zero-loss transfer performance across training while other latent embedding approaches either have gradually decreasing transfer performance or fail to generalize from beginning to end.
LUSR has the most centralized attention and mainly attends to the road.
tSNE of domain-general embedding from different domains reveals close similarity.
tSNE of domain-specific embedding from different domains are clustered based on domain labels.
LUSR is able to help RL agents transfer to similar domains alwithout performance loss.
Cognitive Anteater Robotics Laboratory (UC Irvine)
Neuromorphic Machine Intelligence Lab (UC Irvine)
DARPA: L2M
NSF: CHASE-CI