Research, development, and use of performance benchmarks in high-performance computing (HPC) has been active area for over 20 years, as evidenced by the development of the LINPACK Benchmark and the emergence of the TOP500 list in the early 1990s. These were subsequently followed by the HPCC, HPCG, HPGMG, Graph500 and a plethora of other projects that offer metrics for performance evaluation. Also, initiatives such as SPEC HPC, Blue Waters SPP, NERSC SSP and other procurement methodologies assist in the selection process for new supercomputing and HPC acquisitions.  Evolvable Methods for Benchmarking Realism and Community Engagement workshop will be a new, open, and community-sustained technical engagement venue whose goal is to promote and to advance the science of benchmarking HPC systems.

Despite the long-standing interest in this question, it remains in our view largely unresolved. Over time, the community has broadened, the platforms have changed dramatically, and the applications have evolved. To what extent have today’s benchmarks evolved with them?

One problem is that debates surrounding this question are all too often not grounded in rigorous scientific and technical terms. Thus, EMBRACE’s long-term goal is to increase the overall level of scientific and technical debate surrounding benchmarks, and to do so continually over time, as the broader community’s applications, needs, and understanding of the role of benchmarks change. This vision includes the development of new benchmarks, justified by rigorous analysis against applications.

An open competition is the heart of the envisioned EMBRACE Workshop, a new annual event in which participants submit papers and code. An independent panel of rotating members, with broad representation from across the community, would help define target thematic areas each year and judge the submissions.

Examples of competitive categories might include qualitative ones, such as, "Best characterization of a full application by one or more minimal benchmarks", or quantitative ones, such as "Most accurate measure of platform/parameter change impact on an application".