NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications

👉 Paper | 👉 Benchmark Code | 👉 Analysis/Data Access

Authors: Robert T. Lange, Yujin Tang, Yingtao Tian

Published at NeurIPS 2023 Track on Datasets and Benchmarks

Introducing NEB - A Benchmark Tailored to Neuroevolution

Written in JAX for accelerated and distributed population rollouts
Leveraging evosax & EvoJAX for access to more than 30 different optimizers and accelerated problem wrappers
Easy to adapt to your own method and application - give it a try (colab)!
Open-source release & data available
- For easy comparison without
  any reproduction issues

NEB = 4 problem classes & 11 total sub-tasks

BBO: Black-box optim. benchmark (BBOB) and HPO-B

Control: Brax robotic and MinAtar visual control tasks

Vision: F-MNIST/CIFAR-10 classification & MNIST VAE

Sequence: Addition regression and Sequential MNIST

Core Benchmark API

NeuroEvoBench comes with a simple modular task design and API.
At its core are three ingredients:
1. Policy: Defines the network/BBO substrate to optimize its parameters
2. Task: Defines the task (e.g. rollout of robot policy / loss evaluation of net)
3. Evaluator: Ties policy, task evaluation and logging together for EO loop
The evolutionary optimizer method only has to follow the ask-evaluate-tell API common to many black-box optimization toolboxes such as evosax.
Check out the example colab to get started evaluating your method.

Tuned Learning Curves & Sweeps for 10 Baseline EO

Performance of tuned EO on 9 neuroevolution tasks

50 random search tuning trials for 10 EO methods

👉 We provide all learning curves and tuning experiment results via a Google Cloud Storage bucket. Please check out neuroevobench-analysis for more instructions on how to download and visualize the data.
It is as easy as executing this in the command line: gsutil -m -q cp -r gs://neuroevobench/ .

Resource allocation: population size vs. multi-evaluation

👉 Often times EO practitioners have to trade off the population size (# candidate solutions) and the number of stochastic evaluations for each one of them. Here we show that even for noisy tasks it is beneficial to allocate your resources to a larger population as long as your hardware memory permits.

OpenAI-ES Investigations: Optimizer, Decay, Fitness

👉 How crucial are the choices of GD optimizer, weight regularization, and fitness shaping transformation for the downstream performance of the most popular EO, namely OpenAI-ES? We find that Adan provides a great addition when using pseudo-finite difference gradients. Furthermore, small exponential mean decay can improve performance. Finally, for vision and sequence tasks it is beneficial not to use the raw fitness scores.

Scaling result: population increase always helps

👉 We investigate the scaling of OpenAI-ES with respect to the population and model size. Increasing the population size always improves performance, while larger models can be harder to optimize - likely due to the curse of dimensionality. Nonetheless, for certain tasks, more model capacity can be required and helpful.

Page updated

Google Sites

Report abuse