Information is Power:

Intrinsic Control via Information Capture

Nicholas Rhinehart, Jenny Wang, Glen Berseth, JD Co-Reyes, Danijar Hafner, Chelsea Finn, Sergey Levine

NeurIPS 2021

Abstract

Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.

Short Overview Video

short_overview.m4v

TwoRoom Large Example Videos

Video Format:

Top Left: Overhead view of environment; Top right: Agent observation

Bottom Left: Posterior reconstruction; Bottom right: Prior forecast

IC2 Policy

niche_creation_and_infogain.mp4

Random Policy

random.mp4

SMIRL Policy

smirl.mp4

Oracle Policy

oracle.mp4

DefendTheCenter Example Videos

Video Format:

Top Left: High-resolution agent observation; Top right: Agent observation

Bottom Left: Posterior reconstruction; Bottom right: Prior forecast

IC2 Policy

niche_expansion_800.mp4

Certainty Policy

niche_creation.mp4

Niche Creation Policy

certainty.mp4

Random Policy

random.mp4

Oracle Policy

oracle.mp4

Infogain Policy

infogain_reward.mp4

Niche Creation + Infogain Policy

niche_creation_plus_infogain_reward.mp4

OneRoomCapture3D Example Videos

Video Format:

Top Left: High-resolution agent observation; Top right: Agent observation

Bottom Left: Posterior reconstruction; Bottom right: Prior forecast

IC2 Policy

2_niche_expansion.m4v

Certainty Policy

4_certainty.m4v

Infogain Policy

5_infogain.m4v

Niche Creation Policy

3_niche_creation.m4v

N.C.+Infogain Policy

1_niche_creation_infogain.m4v

SMiRL Policy

6_smirl.m4v

Oracle Policy

8_oracle.m4v

Random Policy

9_random.m4v

TwoRoom Example Videos

Top Left: Overhead view of environment; Top right: Agent observation

Bottom Left: Posterior reconstruction; Bottom right: Prior forecast

IC2 Policy

niche_expansion.m4v

Certainty Policy

certainty.m4v

Infogain Policy

infogain.m4v

N.C.+Infogain Policy

niche_expansion.m4v

Niche Creation Policy

niche_creation.m4v

SMIRL Policy

smirl_tworoom.m4v

Oracle Policy

oracle.m4v

Random Policy

random.m4v