DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Abstract

Adapting to dynamic change is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta reinforcement learning provides a flexible way to adjust behavior according to dynamic changes. However, in real-world applications, the agent may encounter complex environmental changes. Multiple confounders can coherently influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by DecOmposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, whilst minimizing the state transition prediction error. The key benefit of DOMINO is to overcome the underestimation of the mutual information by splitting the centralized context vector into disentangled ones, as well as reducing the demand for negative samples. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamic generalization in terms of sample efficiency and performance in unseen environments.

Example of the multi-confounded environments

Overall Framework:

Performance

Visualization