Intrinsic Motivation for Encouraging Synergistic Behavior (ICLR 2020)
Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta
MIT Computer Science and Artificial Intelligence Laboratory, Facebook Artificial Intelligence Research
Typical Learned Behavior: Our Method
We are able to learn policies for sparse-reward synergistic tasks, such as bimanual manipulation, via our formulations of intrinsic motivation.
Typical Failure Modes for "Extrinsic Reward Only" Baseline
When the system is trained without intrinsic reward, typical failure modes during training simply do not manage to solve the task, because the system has not yet seen enough positive rewards for the policies to converge.
Typical Failure Modes for "Extrinsic Reward with Two-Arm Surprise" Baseline
When the system is trained with the typical formulation of intrinsic motivation as surprise with respect to a predictive model of an environment, we see emergent behavior that affects the world in difficult-to-predict ways, which does not always translate to good performance on the task.
Typical Learned Behavior in Absence of Extrinsic Rewards
We also tried training our formulations of intrinsic motivation without extrinsic rewards. We see the agents learn to act synergistically, i.e. produce effects that could not be predicted as a composition of single-agent behavior, but in ways that do not perform well at the task. This is completely sensible since the system is not told about the task (it is not given extrinsic rewards).