We compare to prior classifier-based RL methods (VICE), goal-reaching algorithms (DDL), exploration bonuses (RND and counts), a heuristically shaped reward function, and a sparse reward.
Our algorithm is able to very quickly learn how to solve these challenging exploration tasks, often reaching significantly better asymptotic performance than most prior methods, and doing so significantly more efficiently. This suggests that MURAL is able to provide directed reward shaping and exploration that is substantially better than standard classifier-based methods (e.g., VICE).