Curious Replay for Model-based Adaptation


Isaac Kauvar*, Chris Doyle*, Linqi Zhou, Nick Haber

Stanford University

Abstract

Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay---a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. 

Code for Curious Replay is available at github.com/AutonomousAgentsLab/curiousreplay

Crafter

Curious Replay achieves a new state-of-the-art score on the Crafter benchmark, surpassing DreamerV3, demonstrating substantially improved success at challenging skills. 

Object Interaction Assay

Curious Replay was inspired by animal behavior. When presented with a novel object, many animals (especially mammals) will quickly begin to actively interact with the object. In contrast, we found that intrinsically-motivated model-based AI agents (in particular, Plan2Explore, [Sekar et al. 2020]) did not quickly interact with the object. Curious Replay was our solution to improve Plan2Explore, yielding a dramatic improvement in object interaction, as well as an improved ability to model the object. Prioritized Experience Replay using reward-based temporal difference (TD) prioritization [Schaul et al. 2015] did not yield equivalent gains. 

Curious Replay: the algorithm

Curious Replay prioritizes sampling of past experiences for training the agent's world model, by focusing on experiences that are the most interesting to the agent - whether because they are unfamiliar or surprising. Inspired by the concept of curiosity, which is often used as an intrinsic reward to guide action selection, here curiosity signals are used to guide selection of what experiences the agent should learn from (i.e. train its model with). The prioritization signal is an additive combination of novelty and surprise: 

Curious Replay is a simple modification to existing agents that use experience replay with minimal computational overhead, leveraging a count of how many times an experience has been sampled and the model losses that are computed for each training batch. 

This prioritization is especially helpful in changing environments, which require adaptation. Curious Replay helps keep the world model up to date as the environment changes, a prerequisite for effective action selection.