CoinRun Videos

Default agent learns to go to the end of the level rather than getting the coin

We show randomly selected videos of the agent trained with the coin only at the end of the level, and tested with the coin position randomly chosen to be anywhere in the level. This agent learns to go to the end of the level rather than picking up the coin. It visibly retains capabilities like dodging monsters and obstacles. The episode eventually times out, providing zero reward.

Agent that sees a little diversity in coin position correctly generalizes to all coin positions

We show randomly selected videos of the agent trained with the coin position randomly chosen to be in the right 20% of the level, and tested with the coin position randomly chosen to be anywhere in the level. This agent learns the intended objective of getting the coin regardless of where it's placed, and continues to be capable at dodging obstacles and monsters.