Imitating Opponent to Win: Adversarial Policy Imitation Learning in Two-player Competitive Games
Abstract
Recent research reveals vulnerabilities in deep reinforcement learning (RL) where adversarial policies impact a target RL agent negatively in a multi-agent setting.
Existing studies face limitations in generalizing knowledge to unexplored policy regions.
Our novel adversarial policy learning:
Introducing an imitator that learns the victim agent's policy and provides feedback for adversarial policy training.
Applying imitation learning to capture underlying characteristics, adapting to changing environment dynamics during training.
Establishing a provable bound for a desired imitating policy when the adversary's policy stabilizes, strengthening our approach.
Incorporating the adversary's value function into the imitation objective, makes the imitator both learn the victim policy and act adversarially.
Experiments in four MuJoCo game environments demonstrate superior performance compared to state-of-the-art algorithms.
Illustrative snapshots of a victim (in blue) against normal and adversarial opponents (in red) in SumoHumans simulator. Two players of the baseline method try to get close to each other and butt their opponents to win. However, APL learns to kneel to stay in the ring and its victims may find it harder to knock it down. Our algorithm even learns to stand better with two knees and dodge attacks from the victim.
Overview and Pseudo Code
<< Description here >>
Detail Overview
Detail Algorithm
Experiment results
<< Experiment set up >>
<< Experiment result comment >>
Citation
@inproceedings{bui2023imitating,
title={Imitating Opponent to Win: Adversarial Policy Imitation Learning in Two-player Competitive Games},
author={Bui, The Viet and Mai, Tien and Nguyen, Thanh H},
booktitle={Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems},
pages={1285--1293},
year={2023}
}