📐 AI Evaluation Beyond Metrics

workshop at IJCAI-ECAI 2022 (Vienna, Austria)

July 24th (Schubert 1 Room)

Invited Speakers & PANELS

Facebook AI Research

University of St Andrews

+ panel on "Cognitive Evaluation with the Animal AI Environment", with

  • Murray Shanahan (Imperial, Deepmind)

  • Tomer D. Ullman (Harvard)

  • Amanda Seed (St. Andrews)

+ panel on "Evaluating pre-trained, generative and prompted systems", with

  • Matthias Samwald (Medical Univ. Vienna)

  • Lama Ahmad (OpenAI)

  • Jo Plested (University of New South Wales)

+ special session on "OECD’s Artificial Intelligence and the Future of Skills (AIFS)"

with Stuart Elliot (OECD), Virginia Dignum (Umeå), Tony Cohn (Leeds) and Songül Tolan (European Commission)

Call for Papers

The 1st international workshop on AI Evaluation Beyond Metrics (EBeM) will be held in Vienna, Austria (July 23-25, 2022).

Cutting edge AI and ML systems are able to solve a variety of problems that were not solvable a few years ago, such as machine translation and medical image analysis. With these AI systems starting to be deployed across important and consequential contexts, robust evaluation of their capabilities and limitations is critical. More generally, traditional approaches to evaluation lack the necessary robustness to analyse the capabilities of complex AI systems. Many AI systems solve a task or excel at a particular benchmark, but then fail at other tasks or instances that putatively represent the same capability.

Therefore, the goal of this workshop is to challenge the widespread but limited approach of evaluating the performance of intelligent systems with aggregated metrics over a benchmark or distribution of tasks. We will discuss further alternative approaches that draw on ideas and recent progress in cognitive and developmental psychology, psychometrics, software testing, and other areas.

Topics (not exhaustive)

  • Evaluation methods founded on cognitive, developmental or comparative psychology

  • Measurement of skills, capabilities, or cognitive abilities

  • Evaluation methods based on software testing or other engineering practices

  • Meta-analysis or comparisons of evaluation instruments

  • The role of evaluation in AI development, policy making, and modeling of social impact

  • Measurements of generality or common-sense

  • Capture and use of evaluation data

  • Analysis of the task space and its relation to corresponding capabilities

  • The role of causality in evaluation

  • Topics complementary to evaluation such as documentation or auditing

  • Alternative evaluation methods with added benefits

  • Discussion and progress in hard to evaluate scenarios


José Hernández-Orallo

Universitat Politècnica de València


Fernando Martínez-Plumed

European Commission


John Burden


Ryan Burnell


Wout Schellaert

Universitat Politècnica de València

Program Committee

  • Atia Cortés - Barcelona Supercomputing Center

  • Alex Taylor - University of Auckland

  • Alex Wang - New York University

  • Celeste Kidd - University of California Berkeley

  • Craig S. Greenberg - NIST

  • David Fernández-Llorca - European Commission, JRC

  • Deborah Raji - Mozilla

  • Ellen Voorhees - NIST

  • Ernest Davis - New York University

  • Guillaume Avrin - Lab. Nat. de Métrologie et d'Essais

  • Isabelle Hupont-Torres - European Commission, JRC

  • Jan Feyereisl - GoodAI

  • Joel Leibo - DeepMind

  • Kevin Smith - MIT

  • Koustuv Sinha - McGill University

  • Ljerka Ostojic - University of Rijeka

  • Melanie Mitchell - Santa Fe Institute

  • Moira Dillon - New York University

  • Naman Shukla - Deepair Solutions

  • Panos Ipeirotis - New York University

  • Peter Flach - University of Bristol

  • Raul Santos-Rodriguez - University of Bristol

  • Ricardo Prudencio - Informatics Center, UFPE

  • Ricardo Vinuesa - KTH Royal Institute of Technology

  • Richard Mallah - Future of Life Institute

  • Rotem Dror - University of Pennsylvania

  • Sean Holden - University of Cambridge

  • Sebastian Gehrmann - Google Research

  • Songul Tolan - European Commission, JRC

  • Tadahiro Taniguchi - Ritsumeikan University

  • Vicky Charisi - European Commission, JRC

VENUE & Registration

All registration is handled by IJCAI (more info), and the actual platform for doing so is https://registration.ijcai.org.

The EBeM venue is Messe Wien Exhibition and Congress Center:

Messe Wien

Hall B, entrance Congress Center

Messeplatz 1

A-1020 Vienna

Metro stop U2 “Messe Prater”