Information is Power:
Intrinsic Control via Information Capture
Nicholas Rhinehart, Jenny Wang, Glen Berseth, JD Co-Reyes, Danijar Hafner, Chelsea Finn, Sergey Levine
NeurIPS 2021
arXiv
Abstract
Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.
Short Overview Video

TwoRoom Large Example Videos
Video Format:
Top Left: Overhead view of environment; Top right: Agent observation
Bottom Left: Posterior reconstruction; Bottom right: Prior forecast
IC2 Policy

Random Policy

SMIRL Policy

Oracle Policy

DefendTheCenter Example Videos
Video Format:
Top Left: High-resolution agent observation; Top right: Agent observation
Bottom Left: Posterior reconstruction; Bottom right: Prior forecast
IC2 Policy

Certainty Policy

Niche Creation Policy

Random Policy

Oracle Policy

Infogain Policy

Niche Creation + Infogain Policy

OneRoomCapture3D Example Videos
Video Format:
Top Left: High-resolution agent observation; Top right: Agent observation
Bottom Left: Posterior reconstruction; Bottom right: Prior forecast
IC2 Policy

Certainty Policy

Infogain Policy

Niche Creation Policy

N.C.+Infogain Policy

SMiRL Policy

Oracle Policy

Random Policy

TwoRoom Example Videos
Top Left: Overhead view of environment; Top right: Agent observation
Bottom Left: Posterior reconstruction; Bottom right: Prior forecast
IC2 Policy

Certainty Policy

Infogain Policy

N.C.+Infogain Policy

Niche Creation Policy

SMIRL Policy

Oracle Policy

Random Policy
