We hypothesize that world models learned through richer exploration would enable model-based RL agents to learn to solve new downstream tasks in a more sample-efficient manner and with higher success rates.
We run DreamerV3 [3], where we try to solve the extrinsic tasks using the learned world models from our initial 500K steps of exploration with SENSEI or Plan2Explore.
Interactions in Robodesk:
We plot the mean over the number of interactions with any object during 1M steps of exploration for SENSEI, the general variant of SENSEI with a VLM-generated environment description (SENSEI-GENERAL), Plan2Explore (P2X) [1], and Random Network Distillation (RND) [2]. Error bars show the standard deviation (3 seeds).
We observe that SENSEI interacts more with the entities in the scene compared to Plan2Explore and RND.
[1] Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International Conference on Machine Learning (ICML), 2020.
[2] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.
[3] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1, 2023.