Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

This website contains supplementary materials accompanying the ICML2020 submission.

Single-player environments (Battle and Battle2)

The videos below demonstrate the performance of agents trained with Sample Factory in several 3D environments. Videos are not sped up, the gameplay is shown at the standard framerate (35 FPS for VizDoom). The agent uses the set of controls available to the human player.

RL agent vs scripted in-game bots (8-player Deathmatch and Duel scenarios)

Unlike other videos, this is the actual game resolution used during training. Recorded this way because VizDoom does not support scripted bots in game demo files, used to record other videos. In this experiment we set frameskip to 2 instead of traditional 4 frames.

Self-Play VizDoom Duel

Agent trained using self-play and population-based training with 8 competing policies on a 36-core server with four GPUs. In this experiment we set frameskip to 2 instead of traditional 4 frames. Agents in the videos were trained on 2.5 billion environment transitions (about 18 years of VizDoom gameplay for agents in the population).

Architecture

High-level architecture of Sample Factory

Timeline diagram of double-buffered sampling

Citation

@inproceedings{petrenko2020sf,

title={Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning},

author={Petrenko, Aleksei and Huang, Zhehui and Kumar, Tushar and Sukhatme, Gaurav and Koltun, Vladlen},

booktitle={ICML},

year={2020}

}