SpeedyZero

SpeedyZero: Mastering Atari with Limited Data and Time

Yixuan Mei*, Jixuan Gao*, Weirui Ye, Shaohuai Liu, Yang Gao†, Yi Wu†

Accepted as a poster paper at ICLR 2023.

Overview

SpeedyZero is a fast and sample-efficient distributed RL training system. It is based on EfficientZero, a sample-efficient model-based RL training method. Through system and algorithm co-design, SpeedyZero achieves a 14.5 times speedup, mastering Atari in only 35 minutes.

Introduction to SpeedyZero

System Optimizations

Modular design and workload distribution to multiple machines with reduced network communication.
Efficient on-node communication with a customized shared memory object-store. (SMOS)
Data transfer optimizations to reduce the latency of critical components.

Algorithmic Optimizations

Priority Refresh: a prioritized experience replay method that can stabilize value training.
Clipped LARS: an optimizer for large batch size training in SpeedyZero.

System Architecture Comparison

EfficientZero finishes all computation on a single machine. In comparison, SpeedyZero partitions the workflow into data collection (Data Node), batch reanalysis (Reanalysis Node), and training (Trainer node) and distributes the three stages to different machines.

Experiment Results on Atari 100k Benchmark

Tested on two different clusters (details can be found in Appendix A.1).
Achieves at most 14.5 times acceleration compared with EfficientZero and the sample-efficiency is on par.

Effect of Priority Refresh and Clipped LARS

Compared with uniform sampling and DPER, Priority Refresh shows stable improvement in the predicted values and much lower variance across different trials.
Compared with SGD and LARS, Clipped LARS significantly stabilizes the training process.

Useful Links

Paper: SpeedyZero: Mastering Atari with Limited Data and Time
SpeedyZero Code: We are currently cleaning up, but a preliminary version can be found in OpenReview's supplementary materials.
SMOS Code and Release: [GitHub] [PyPI]

Poster of SpeedyZero at ICLR 2023

Page updated

Google Sites

Report abuse