SpeedyZero: Mastering Atari with Limited Data and Time
Yixuan Mei*, Jixuan Gao*, Weirui Ye, Shaohuai Liu, Yang Gao†, Yi Wu†
Accepted as a poster paper at ICLR 2023.
Overview
SpeedyZero is a fast and sample-efficient distributed RL training system. It is based on EfficientZero, a sample-efficient model-based RL training method. Through system and algorithm co-design, SpeedyZero achieves a 14.5 times speedup, mastering Atari in only 35 minutes.
Introduction to SpeedyZero
System Optimizations
Modular design and workload distribution to multiple machines with reduced network communication.
Efficient on-node communication with a customized shared memory object-store. (SMOS)
Data transfer optimizations to reduce the latency of critical components.
Algorithmic Optimizations
Priority Refresh: a prioritized experience replay method that can stabilize value training.
Clipped LARS: an optimizer for large batch size training in SpeedyZero.
System Architecture Comparison
EfficientZero finishes all computation on a single machine. In comparison, SpeedyZero partitions the workflow into data collection (Data Node), batch reanalysis (Reanalysis Node), and training (Trainer node) and distributes the three stages to different machines.
Experiment Results on Atari 100k Benchmark
Tested on two different clusters (details can be found in Appendix A.1).
Achieves at most 14.5 times acceleration compared with EfficientZero and the sample-efficiency is on par.
Effect of Priority Refresh and Clipped LARS
Compared with uniform sampling and DPER, Priority Refresh shows stable improvement in the predicted values and much lower variance across different trials.
Compared with SGD and LARS, Clipped LARS significantly stabilizes the training process.
Useful Links
Paper: SpeedyZero: Mastering Atari with Limited Data and Time
SpeedyZero Code: We are currently cleaning up, but a preliminary version can be found in OpenReview's supplementary materials.