SpeedyZero: Mastering Atari with Limited Data and Time

Yixuan Mei*, Jixuan Gao*, Weirui Ye, Shaohuai Liu, Yang Gao†, Yi Wu†

Accepted as a poster paper at ICLR 2023.

 

Overview

SpeedyZero is a fast and sample-efficient distributed RL training system. It is based on EfficientZero, a sample-efficient model-based RL training method. Through system and algorithm co-design, SpeedyZero achieves a 14.5 times speedup, mastering Atari in only 35 minutes.

 

Introduction to SpeedyZero

System Optimizations

Algorithmic Optimizations

 

System Architecture Comparison

EfficientZero finishes all computation on a single machine. In comparison, SpeedyZero partitions the workflow into data collection (Data Node), batch reanalysis (Reanalysis Node), and training (Trainer node) and distributes the three stages to different machines.

 

Experiment Results on Atari 100k Benchmark

 

Effect of Priority Refresh and Clipped LARS

 Useful Links

 Poster of SpeedyZero at ICLR 2023