Intrinsic Motivation for Encouraging Synergistic Behavior (ICLR 2020)

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

MIT Computer Science and Artificial Intelligence Laboratory, Facebook Artificial Intelligence Research

Project Video

Experimental Results

Typical Learned Behavior: Our Method

We are able to learn policies for sparse-reward synergistic tasks, such as bimanual manipulation, via our formulations of intrinsic motivation.

bar_good2.mp4

bar_good1.mp4

ball_good1.mp4

ball_good2.mp4

bottle_good2.mp4

bottle_good1.mp4

corkscrew_good1.mp4

corkscrew_good2.mp4

twoant1.mov

twoant2.mov

twohfo1.mov

twohfo2.mov

Typical Failure Modes for "Extrinsic Reward Only" Baseline

When the system is trained without intrinsic reward, typical failure modes during training simply do not manage to solve the task, because the system has not yet seen enough positive rewards for the policies to converge.

bar_extonly.mp4

ball_extonly.mp4

bottle_extonly.mp4

corkscrew_extonly.mp4

Typical Failure Modes for "Extrinsic Reward with Two-Arm Surprise" Baseline

When the system is trained with the typical formulation of intrinsic motivation as surprise with respect to a predictive model of an environment, we see emergent behavior that affects the world in difficult-to-predict ways, which does not always translate to good performance on the task.

bar_surprise.mp4

ball_surprise.mp4

bottle_surprise.mp4

corkscrew_surprise.mp4

Typical Learned Behavior in Absence of Extrinsic Rewards

We also tried training our formulations of intrinsic motivation without extrinsic rewards. We see the agents learn to act synergistically, i.e. produce effects that could not be predicted as a composition of single-agent behavior, but in ways that do not perform well at the task. This is completely sensible since the system is not told about the task (it is not given extrinsic rewards).

bar_noextrinsic.mp4

ball_noextrinsic.mp4

bottle_noextrinsic.mp4

corkscrew_noextrinsic.mp4