Imitating Opponent to Win: Adversarial Policy Imitation Learning in Two-player Competitive Games

The Viet Bui¹, Tien Mai¹, Thanh H.Nguyen²

¹ Singapore Management University Singapore, Singapore

² University of Oregon Eugene, Oregon, United States

tvbui@smu.edu.sg, atmai@smu.edu.sg, thanhhng@cs.uoregon.edu

AAMAS-2023

Abstract

Recent research reveals vulnerabilities in deep reinforcement learning (RL) where adversarial policies impact a target RL agent negatively in a multi-agent setting.

Existing studies face limitations in generalizing knowledge to unexplored policy regions.

Our novel adversarial policy learning:

Introducing an imitator that learns the victim agent's policy and provides feedback for adversarial policy training.
Applying imitation learning to capture underlying characteristics, adapting to changing environment dynamics during training.
Establishing a provable bound for a desired imitating policy when the adversary's policy stabilizes, strengthening our approach.
Incorporating the adversary's value function into the imitation objective, makes the imitator both learn the victim policy and act adversarially.
Experiments in four MuJoCo game environments demonstrate superior performance compared to state-of-the-art algorithms.

Illustrative snapshots of a victim (in blue) against normal and adversarial opponents (in red) in SumoHumans simulator. Two players of the baseline method try to get close to each other and butt their opponents to win. However, APL learns to kneel to stay in the ring and its victims may find it harder to knock it down. Our algorithm even learns to stand better with two knees and dodge attacks from the victim.