We hypothesize that world models learned through richer exploration would enable model-based RL agents to learn to solve new downstream tasks in a more sample-efficient manner and with higher success rates.
We run DreamerV3 [3], where we try to solve the extrinsic tasks using the learned world models from our initial 500K steps of exploration with SENSEI or Plan2Explore.
Interactions in Robodesk:
We plot the mean over the number of interactions with any object during 1M steps of exploration for SENSEI, the general variant of SENSEI with a VLM-generated environment description (SENSEI-GENERAL), Plan2Explore (P2X) [1], and Random Network Distillation (RND) [2]. Error bars show the standard deviation (3 seeds).
We observe that SENSEI interacts more with the entities in the scene compared to Plan2Explore and RND.
We plot the mean of the episode score obtained during evaluation for the Robodesk tasks open_drawer, upright_block_off_table, and lift_ball, with world models learned from SENSEI vs. Plan2Explore (P2E) exploration.
open_drawer
upright_block_off_table
lift_ball
Task-based exploration in Pokémon Red comparing SENSEI GENERAL to Plan2Explore and DreamerV3 for 750k steps.
We partition the overall game map into unique map segments for different routes, towns, or buildings (on the left). We sequentially numbered segments that need to be traversed from game start (0) to the first Gym (9) and plot the percentage of random seeds that reach each segment (left).
Below, temporal exploration trends are visualized by plotting the mean number of unique map segments visited and the highest level of the agent’s Pokémon over episodes, smoothed with a moving average (window size 5). Shaded areas indicate standard error (5 seeds).
[1] Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International Conference on Machine Learning (ICML), 2020.
[2] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.
[3] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1, 2023.