NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications

πŸ‘‰ Paper | πŸ‘‰ Benchmark Code | πŸ‘‰ Analysis/Data Access

Authors: Robert T. Lange, Yujin Tang, Yingtao Tian

Published at NeurIPS 2023 Track on Datasets and Benchmarks

Introducing NEB - A Benchmark Tailored to Neuroevolution


NEB = 4 problem classes & 11 total sub-tasks

BBO: Black-box optim. benchmark (BBOB) and HPO-B

Control: Brax robotic and MinAtar visual control tasks

Vision: F-MNIST/CIFAR-10 classification & MNIST VAE

Sequence: Addition regression and Sequential MNISTΒ 

Core Benchmark API


Tuned Learning Curves & Sweeps for 10 Baseline EO

Performance of tuned EO on 9 neuroevolution tasks

50 random search tuning trials for 10 EO methods

πŸ‘‰ We provide all learning curves and tuning experiment results via a Google Cloud Storage bucket. Please check out neuroevobench-analysis for more instructions on how to download and visualize the data.
It is as easy as executing this in the command line: gsutil -m -q cp -r gs://neuroevobench/ .

Resource allocation: population size vs. multi-evaluation

πŸ‘‰ Often times EO practitioners have to trade off the population size (# candidate solutions) and the number of stochastic evaluations for each one of them. Here we show that even for noisy tasks it is beneficial to allocate your resources to a larger population as long as your hardware memory permits.

OpenAI-ES Investigations: Optimizer, Decay, Fitness

πŸ‘‰ How crucial are the choices of GD optimizer, weight regularization, and fitness shaping transformation for the downstream performance of the most popular EO, namely OpenAI-ES? We find that Adan provides a great addition when using pseudo-finite difference gradients. Furthermore, small exponential mean decay can improve performance. Finally, for vision and sequence tasks it is beneficial not to use the raw fitness scores.

Scaling result: population increase always helps

πŸ‘‰ We investigate the scaling of OpenAI-ES with respect to the population and model size. Increasing the population size always improves performance, while larger models can be harder to optimize - likely due to the curse of dimensionality. Nonetheless, for certain tasks, more model capacity can be required and helpful.