Party size is number Pokemons collected in the Party (max is 6).
In order to succeed in the game you need to have a diverse team with high levels, which means achieving a high party level (higher counts on the right side of the first histogram).
Top-Right: SENSEI-Gen2 can differentiate the gym from an unimportant house (museum), unlike SENSEI-Gen1.
Bottom-Left: Semantic rewards increase while defeating a wild Pokemon.
Bottom-Right: SENSEI-Gen2 semantic rewards decline in battle as the poison status progresses but peaks sharply after defeating a trainer.
Figures E3: Downstream Task Performances in Robodesk. We plot the mean of the episode score obtained during evaluation for the Robodesk tasks (top) open_drawer (middle) upright_block_off_table and (bottom) lift_ball, with world models learned from SENSEI vs. Plan2Explore (P2E) exploration. Shaded areas depict the standard error (10 seeds) and we apply smoothing over the score trajectories with window size 3.
Figures E4: Downstream Task Performances in Robodesk for Dreamer with a headstart. We also show results for learning a task policy from scratch with DreamerV3 for the upright_block_off_table task. Shaded areas depict the standard error (5 seeds) and we apply smoothing over the score trajectories with window size 3.
On the right is the SENSEI vs. Plan2explore plot for reference (readjusted x-axis to reflect the 1mil exploration steps as well here)
Figure E5: Interactions in MiniHack for smaller VLM-Motif dataset. We plot the mean number of interactions with task-relevant objects and the environment reward (unknown to the agents) collected by Plan2Explore (P2X), original SENSEI (VLM-Motif on 100K pairs) and an ablation of SENSEI where we train with a VLM-Motif learned from only 25K pairs. Error bars show the standard error (10 seeds).
MiniHack KeyRoom-S15
Top: Game map, Bottom: Egocentric view as agent's observation