Thomas A. Berrueta*, Allison Pinosky, Todd D. Murphey
Northwestern University
The experiences of embodied agents, such as robots and autonomous vehicles, exhibit correlations in space and time. These correlations pose challenges for learning algorithms, which often depend on the assumption that data are independent and identically distributed. These challenges are particularly evident in reinforcement learning (RL), where the sequential nature of agent experience is unavoidable. In this work, we address these limitations by leveraging the statistical physics of ergodic diffusion processes. Our approach, termed "maximum diffusion reinforcement learning" (MaxDiff RL), provides a theoretical framework for embodied learning in the face of intrinsically correlated data. We prove that MaxDiff RL generalizes Maximum Entropy RL, offers novel theoretical guarantees for robustness and single-shot learning, and demonstrate that our model-based implementations surpass state-of-the-art performance on established benchmarks.
Decorrelation guarantees
Robustness guarantees
Single-shot learning guarantees
State-of-the-art performance
MaxDiff RL agents balance between diffusive exploration and task exploitation through the use of a temperature-like parameter. The value of this parameter can determine the ergodic properties of the underlying process, as well as whether or not the guarantees provided by our framework hold. This video illustrates the relationship between temperature and agent performance.
MaxDiff RL agents are robust to model and environmental randomizations. As long as MaxDiff RL agents remain ergodic, they are guaranteed to be capable of achieving a learning task regardless of their initial conditions or random seed. This video illustrates the robustness of MaxDiff RL agents across random seeds.
MaxDiff RL policies map agent dynamics onto task-aware diffusion processes. In doing so, they minimize the influence of the agent dynamics on their state trajectory statistics, suggesting that MaxDiff RL agents may exhibit favorable generalization properties. This video illustrates the zero-shot generalization properties of MaxDiff RL agents across different embodiments.
MaxDiff RL agents are capable of learning to solve tasks over the course of individual, continuous deployments, which we refer to as single-shot learning. As long as MaxDiff RL agents remain ergodic, they are guaranteed to be capable of single-shot learning. This video illustrates the single-shot learning capabilities of MaxDiff RL.