Imitation Learning for Mean Field Games with Correlated Equilibrium
Imitation Learning for Mean Field Games with Correlated Equilibrium
Conducting imitation learning (IL) on a population of agents is a challenging problem as agents' interactions grow exponentially with the population size. Mean field games study the many-agent problem by focusing on the interaction between each individual agent with the population mean effect. Though tractable, it is highly non-trivial to restore mean field Nash equilibrium (MFNE) from demonstrations. Importantly, there are many examples in reality that cannot be explained by MFNE. For example, the behaviour of bird flocks or fish schools occurs under a correlated signal. To accommodate for this, we put forward a novel solution concept named mean field correlated equilibrium (MFCE). We show in theory the existence of finite MFCE and the fact that MFNE is a subclass of MFCE. To bypass equilibrium selection problem, we further propose the maximum entropy MFCE. To recover MFCE policy from demonstrations, we introduce an IL framework based on generative adversarial training. Our IL framework can recover not only the policy but also the correlation device. Given the mean-field dynamics may not be measurable in reality, we adopt the signature from rough path theory to represent it. We illustrate the performance of our framework by comparing the state-of-the-art mean field imitation learning algorithm on several tasks.
The experiment is based on the movement of fish. In nature, fish spontaneously align velocity according to the overall movement of the fish school, so that the final fish school forms a stable movement velocity.
We set the dimension of signal space to be 4. Under each signal, the fish will head for a certain direction. Each direction is corresponding to one policy.
The left view shows the policy recovered by MFIL, and the middle view shows the expert policy. The right view shows the policy recovered by MFIRL.
As shown in the video, the policy recovered by MFIL can match the expert policy after training while MFIRL not.
Sequential Squeeze is a game with multi-step. The purpose to implement this game is to verify the ability to recover expert policy through demonstrations sampled from the multi-step game.
In this experiment, a school of birds will pass a passage separated by two columns. The birds have to decide whether to cross the left passage or the right passage. The proportion of birds passing through the left and right channels shows the policy adopted by the birds.
The left view shows the policy recovered by MFIL, and the middle view shows the expert policy. The right view shows the policy recovered by MFIRL.
As shown in the video, the policy recovered by MFIL can match the expert policy after training while MFIRL not.