Model-based curiosity combines active learning approaches to optimal sampling with the information gain based incentives for exploration presented in the curiosity literature. Existing model-based curiosity methods look to approximate prediction uncertainty with approaches which struggle to scale to many prediction-planning pipelines used in robotics tasks. We address these scalability issues with an adversarial curiosity method minimizing a score given by a discriminator network. This discriminator is optimized jointly with a prediction model and enables our active learning approach to sample sequences of observations and actions which result in predictions considered the least realistic by the discriminator. We demonstrate increased downstream task performance in simulated environments using our adversarial curiosity approach compared to other model-based and model-free exploration strategies. We further demonstrate the ability of our adversarial curiosity method to scale to a robotic manipulation prediction-planning pipeline where we improve sample efficiency and prediction performance for a domain transfer problem.
Our approach for model based curiosity. A predictive model generates predictions on a number of potential trajectories. These predictions are evaluated with a discriminator and trajectory that corresponds to the least realistic prediction is executed. The predictive model and the discriminator are updated with the newly collected data.
Active sampling to enable domain transfer. Our method trains an action-conditioned predictive model and a discriminator on the dataset in the initial domain. It then samples actions from the new domain that result in the most uncertain predictions, allowing it to train a predictive model in the new domain with a small number of samples.
Improvement in L2 error for the prediction model trained with curious data over the prediction model trained with the random data. The prediction model trained with curious data performs better by more than the standard error on all but one quantity of samples.
The process used for online training with a curiosity objective provided by the loss from our discriminator network in a domain transfer problem. The model and the discriminator are initially trained on an existing dataset from domain A (1). The model and discriminator are used to select and execute sequences of actions that maximize the curiosity objective in domain B, generating a new dataset (2). The dataset from domain B is used to train the model (3). The model is used to select sequences of actions that maximize a task-based objective, allowing the robot to perform useful tasks in domain B (4).
Example interactions caused by the curious exploration policy
Data will be released upon publication
* Image used with permission from 
 Annie Xie, Frederik Ebert, Sergey Levine, and Chelsea Finn. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight. Robotics: Science and Systems, apr 2019. URL http://arxiv.org/abs/1904.05538.