We show randomly selected videos of the agent trained for 25 steps, and evaluated for 200 steps. This agent learns to pick up shields early in the episode when there are monsters present, but doesn't switch completely to collecting apples even once all the monsters are gone, continuing to collect a high number of extra shields.
We show randomly selected videos of the the agent trained for 100 steps, and evaluated for 200 steps. This agent learns to pick up shields early in the episode when there are monsters present, but switches almost exclusively to collecting apples once the monsters are gone.