Agent57

Agent57 on all 57 games

Playlist in which Agent57 displays a performance above the human benchmark for all 57 games (one video per game).

State-action Value Function Parameterization - Ice Hockey

NGU - Exploratory policy

NGU - Exploitative policy

Agent57 - Exploratory policy

Agent57 - Exploitative policy

State-action Value Function Parameterization - Surround

NGU - Exploratory policy

NGU - Exploitative policy

Agent57 - Exploratory policy

Agent57 - Exploitative policy

Adaptive Discount Factor - Jamesbond

R2D2 (Retrace)

R2D2 (bandit)

Backprop Through Time Window Size - Solaris

NGU (short backprop through time window)

Agent57 (long backprop through time window)