SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk, Pavel Kolev, Georg Martius
*Equal Contribution

We propose SENSEI (SEmaNtically Sensible ExploratIon): a framework for guiding the intrinsically motivated exploration of model-based agents through foundation models without assuming access to expert data, high-level actions, or perfect environment captions.

Task-free Exploration in Minihack

Interactions and Rewards in KeyRoom-S15

10 seeds, error bars show standard error

Interactions and Rewards in KeyChest

10 seeds, error bars show standard error

Solving Downstream Tasks with SENSEI in Minihack

We hypothesize that world models learned through richer exploration would enable model-based RL agents to learn to solve new downstream tasks in a more sample-efficient manner and with higher success rates.

We run DreamerV3 [3], where we try to solve the extrinsic tasks using the learned world models from our initial 500K steps of exploration with SENSEI or Plan2Explore.

KeyRoom-S15 KeyChest

Task-free Exploration in Robodesk

Interactions in Robodesk:
We plot the mean over the number of interactions with any object during 1M steps of exploration for SENSEI, the general variant of SENSEI with a VLM-generated environment description (SENSEI-GENERAL), Plan2Explore (P2X) [1], and Random Network Distillation (RND) [2]. Error bars show the standard deviation (3 seeds).

We observe that SENSEI interacts more with the entities in the scene compared to Plan2Explore and RND.

Solving Downstream Tasks with SENSEI in RoboDesk

We plot the mean of the episode score obtained during evaluation for the Robodesk tasks open_drawer, upright_block_off_table, and lift_ball, with world models learned from SENSEI vs. Plan2Explore (P2E) exploration.

open_drawer

upright_block_off_table

lift_ball

SENSEI playing Pokemon Red

Task-based exploration in Pokémon Red comparing SENSEI GENERAL to Plan2Explore and DreamerV3 for 750k steps.

We partition the overall game map into unique map segments for different routes, towns, or buildings (on the left). We sequentially numbered segments that need to be traversed from game start (0) to the first Gym (9) and plot the percentage of random seeds that reach each segment (left).

Below, temporal exploration trends are visualized by plotting the mean number of unique map segments visited and the highest level of the agent’s Pokémon over episodes, smoothed with a moving average (window size 5). Shaded areas indicate standard error (5 seeds).

References

[1] Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International Conference on Machine Learning (ICML), 2020.

[2] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.

[3] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1, 2023.

Page updated

Google Sites

Report abuse

SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Cansu Sancaktar*, Christian Gumbsch*, Andrii Zadaianchuk, Pavel Kolev, Georg Martius*Equal Contribution

We propose SENSEI (SEmaNtically Sensible ExploratIon): a framework for guiding the intrinsically motivated exploration of model-based agents through foundation models without assuming access to expert data, high-level actions, or perfect environment captions.

Task-free Exploration in Minihack

Interactions and Rewards in KeyRoom-S15

10 seeds, error bars show standard error

Interactions and Rewards in KeyChest

10 seeds, error bars show standard error

Solving Downstream Tasks with SENSEI in Minihack

KeyRoom-S15 KeyChest

Task-free Exploration in Robodesk

Solving Downstream Tasks with SENSEI in RoboDesk

SENSEI playing Pokemon Red

References

Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk, Pavel Kolev, Georg Martius
*Equal Contribution