SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

sensei_promotional_rebuttal.mp4

We propose SENSEI (SEmaNtically Sensible ExploratIon): a framework for guiding the intrinsically motivated exploration of model-based agents through foundation models without assuming access to expert data, high-level actions, or perfect environment captions.

Task-free Exploration in Minihack

Interactions and Rewards in KeyRoom-S15

10 seeds, error bars show standard error

Interactions and Rewards in KeyChest

10 seeds, error bars show standard error

Solving Downstream Tasks with SENSEI in Minihack

We hypothesize that world models learned through richer exploration would enable model-based RL agents to learn to solve new downstream tasks in a more sample-efficient manner and with higher success rates.

We run DreamerV3 [3], where we try to solve the extrinsic tasks using the learned world models from our initial 500K steps of exploration with SENSEI or Plan2Explore.

KeyRoom-S15 KeyChest

Task-free Exploration in Robodesk

Interactions in Robodesk:
We plot the mean over the number of interactions with any object during 1M steps of exploration for SENSEI, the general variant of SENSEI with a VLM-generated environment description (SENSEI-GENERAL), Plan2Explore (P2X) [1], and Random Network Distillation (RND) [2]. Error bars show the standard deviation (3 seeds).

We observe that SENSEI interacts more with the entities in the scene compared to Plan2Explore and RND.

References

[1] Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International Conference on Machine Learning (ICML), 2020.

[2] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.

[3] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1, 2023.

Page updated

Google Sites

Report abuse