Intrinsic Motivation for Encouraging Synergistic Behavior (ICLR 2020)

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

MIT Computer Science and Artificial Intelligence Laboratory, Facebook Artificial Intelligence Research

Project Video

Experimental Results

Typical Learned Behavior: Our Method

We are able to learn policies for sparse-reward synergistic tasks, such as bimanual manipulation, via our formulations of intrinsic motivation.

bar_good2.mp4
bar_good1.mp4
ball_good1.mp4
ball_good2.mp4
bottle_good2.mp4
bottle_good1.mp4
corkscrew_good1.mp4
corkscrew_good2.mp4
twoant1.mov
twoant2.mov
twohfo1.mov
twohfo2.mov

Typical Failure Modes for "Extrinsic Reward Only" Baseline

When the system is trained without intrinsic reward, typical failure modes during training simply do not manage to solve the task, because the system has not yet seen enough positive rewards for the policies to converge.

bar_extonly.mp4
ball_extonly.mp4
bottle_extonly.mp4
corkscrew_extonly.mp4

Typical Failure Modes for "Extrinsic Reward with Two-Arm Surprise" Baseline

When the system is trained with the typical formulation of intrinsic motivation as surprise with respect to a predictive model of an environment, we see emergent behavior that affects the world in difficult-to-predict ways, which does not always translate to good performance on the task.

bar_surprise.mp4
ball_surprise.mp4
bottle_surprise.mp4
corkscrew_surprise.mp4

Typical Learned Behavior in Absence of Extrinsic Rewards

We also tried training our formulations of intrinsic motivation without extrinsic rewards. We see the agents learn to act synergistically, i.e. produce effects that could not be predicted as a composition of single-agent behavior, but in ways that do not perform well at the task. This is completely sensible since the system is not told about the task (it is not given extrinsic rewards).

bar_noextrinsic.mp4
ball_noextrinsic.mp4
bottle_noextrinsic.mp4
corkscrew_noextrinsic.mp4