Agent57
Agent57 on all 57 games
Agent57 on all 57 games
Playlist in which Agent57 displays a performance above the human benchmark for all 57 games (one video per game).
State-action Value Function Parameterization - Ice Hockey
State-action Value Function Parameterization - Ice Hockey
NGU - Exploratory policy
NGU - Exploitative policy
Agent57 - Exploratory policy
Agent57 - Exploitative policy
State-action Value Function Parameterization - Surround
State-action Value Function Parameterization - Surround
NGU - Exploratory policy
NGU - Exploitative policy
Agent57 - Exploratory policy
Agent57 - Exploitative policy
Adaptive Discount Factor - Jamesbond
Adaptive Discount Factor - Jamesbond
R2D2 (Retrace)
R2D2 (bandit)
Backprop Through Time Window Size - Solaris
Backprop Through Time Window Size - Solaris
NGU (short backprop through time window)
Agent57 (long backprop through time window)