Benchmarking@GECCO-2021 Workshop

Good Benchmarking Practices for Evolutionary Computation

Sunday, July 11, 2021 at 11:00-12:50 CEST = Krakow/Berlin/Amsterdam/Paris time
-- online event --
Free registration for SIGEVO members

The Benchmarking@GECCO Workshop Series

A platform to come together and to discuss recent progress and challenges in the area of benchmarking optimization heuristics.

This workshop will continue our workshop series that we started in 2020 (BENCHMARK@GECCO-2020 with >75 participants and Benchmarking@PPSN2021 with >90 participants). The core theme is on benchmarking evolutionary computation methods and related sampling-based optimization heuristics, but each year, we will change the focus.

Our topics for the 2021 edition are:
-
reproducibility in evolutionary computation
- relevance of benchmarking for
industry
- benchmarking in
robotics

Schedule for the 2021 Workshop

  • Welcome & Opening by the Workshop Organizers

  • Invited Talk 1:
    Relevance of Benchmarking for Industry or “What does it take for an algorithm to be applied in industrial contexts?”
    Speaker:
    Bernhard Sendhoff

  • Invited Talk 2:
    Benchmarking in Robotics
    Speaker:
    Emma Hart and Gusz Eiben

  • Panel discussion:
    Reproducibility in Evolutionary Computation: How are we doing and how can we do better?
    Organizers:
    Jürgen Branke, Manuel López-Ibáñez, Luís Paquete
    Related publication:
    https://arxiv.org/abs/2102.03380
    We will have a discussion open to all participants in the workshop regarding obstacles to reproducibility in Evolutionary Computation and how we can overcome them.

Organizers

Hosting Event

The Genetic and Evolutionary Computation Conference (GECCO 2021), which will be held as an online event!

Related Event

A similar benchmarking best practices workshop will be held at CEC 2021, which takes place from June 28 - July 1, 2021, in Kraków, Poland:
Benchmarking@CEC-2021.

General Aspects of Benchmarking Evolutionary Computation Methods


Benchmarking plays a vital role for understanding performance and search behavior of sampling-based optimization techniques such as evolutionary algorithms. Even though benchmarking is a highly-researched topic within the evolutionary computation community, there are still a number of open questions and challenges that should be explored:

  1. most commonly-used benchmarks are too small and cover only a part of the problem space,

  2. benchmarks lack the complexity of real-world problems, making it difficult to transfer the learned knowledge to work in practice,

  3. we need to develop proper statistical analysis techniques that can be applied depending on the nature of the data,

  4. we need to develop user-friendly, openly accessible benchmarking software. This enables a culture of sharing resources to ensure reproducibility, and which helps to avoid common pitfalls in benchmarking optimization techniques. As such, we need to establish new standards for benchmarking in evolutionary computation research so we can objectively compare novel algorithms and fully demonstrate where they excel and where they can be improved.

The topics of interest for this session of the workshop include, but are not limited to:

  • Performance measures for comparing algorithms behavior;

  • Novel statistical approaches for analyzing empirical data;

  • Selection of meaningful benchmark problems;

  • Landscape analysis;

  • Data mining approaches for understanding algorithm behavior;

  • Transfer learning from benchmark experiences to real-world problems;

  • Benchmarking tools for executing experiments and analysis of experimental results.

Understanding Reproducibility in Evolutionary Computation


Experimental studies are prevalent in Evolutionary Computation (EC), and concerns about the reproducibility and replicability of such studies has increased in recent years, following similar discussions in other scientific fields. In this workshop, we want to raise awareness of the reproducibility issue, shed light on the obstacles when trying to reproduce results, and discuss best practices in making results reproducible as well as reporting reproducibility results.

We invite researchers to look at their own empirical work, published at least 10 years ago (before 2010) in a journal or a conference with proceedings. We invite submissions of papers repeating the empirical study, either by re-using, updating or reimplementing the necessary codes and datasets, irrespectively of whether this code was published in some form at the time or not.


We expect in the paper

o A documentation of the process necessary to re-run the experiments. For example, you may have to retrieve the benchmark problems from the web, downgrade your system or some libraries, modify your original code because some library is nowhere to be found, reinstall a specific compiler or interpreter, etc.

o A discussion on whether you consider your paper reproducible, and why you think this is the case. If you ran your code with fixed random seeds and you have recorded them, you may be able to reproduce identical results. If you haven’t recorded the random seeds, you may need to use statistical tests to check whether the conclusions still hold. You may even want to try some different problem instances or parameter settings to check whether results still hold for slightly different experimental settings.

o Sufficient details to allow an independent reproduction of your experiment by a third party, including all necessary artifacts used in the attempt to reproduce results (code, benchmark problems, scripts to generate plots or do statistical analysis). Artifacts should be made publicly and permanently available via Zenodo (https://zenodo.org) or other data repository or submitted together with the paper to be archived in the ACM Digital Library.

In the end, there may be various possible outcomes, and all are acceptable for a paper: you are unable to run or compile the code, you are able to run the code but it does not give you the expected results (or no result at all), the program crashed regularly (before getting results), you do not remember the parameter settings used, etc. All these are valid conclusions, we care more about the description of the process and challenges to reproduce results than whether you have actually been able to reproduce them or not.