Proposed Method: Consistent Leader-Follower (CoLF)
In this paper, we design and introduce explicit inductive biases to induce a consistent leader–follower structure. Building on this design principle, we propose Consistent Leader–Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader–follower role differentiation in a two-robot system.
1. Overview of the Proposed Framework
CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective, which encourages the follower to predict the leader’s action from its local observation. To maximize a variational lower bound of this mutual information, we introduce an auxiliary distribution that enables the follower to predict the leader’s action from its local observation.