Evaluating Sports Analytics Models: Challenges, Approaches, and Lessons Learned (paper, presentation)
Jesse Davis, Lotte Bransen, Laurens Devos, Wannes Meert, Pieter Robberechts, Jan Van Haaren and Maaike Van Roy
Evaluating Object Permanence in Embodied Agents using the Animal-AI Environment (paper, presentation)
Konstantinos Voudouris, Niall Donnelly, Danaja Rutar, Ryan Burnell, John Burden, Lucy Cheke and José Hernández-Orallo
A Framework for Categorising AI Evaluation Instruments (paper, presentation)
Anthony G Cohn, José Hernández-Orallo, Julius Sechang Mboli, Yael Moros-Daval, Zhiliang Xiang and Lexin Zhou
Reject Before You Run: Small Assessors Anticipate Big Language Models (paper, presentation)
Lexin Zhou, Fernando Martínez-Plumed, José Hernández-Orallo, Cèsar Ferri and Wout Schellaert
The Relevance of Non-Human Errors in Machine Learning (paper, presentation)
Ricardo Baeza-Yates and Marina Estévez-Almenzar
Robustness Testing of Machine Learning Families using Instance-Level IRT-Difficulty (paper, presentation)
Raül Fabra-Boluda, Cèsar Ferri, Fernando Martínez-Plumed and Maria Jose Ramirez-Quintana
Item Response Theory to Evaluate Speech Synthesis: Beyond Synthetic Speech Difficulty (paper, presentation)
Chaina Oliveira and Ricardo Prudêncio
Evaluating Understanding on Conceptual Abstraction Benchmarks (paper, presentations)
Victor Vikram Odouard and Melanie Mitchell
On Young Children’s Exploration, Aha! Moments and Explanations in Model Building for Self-Regulated Problem-Solving (paper, presentation)
Vicky Charisi, Natalia Díaz Rodríguez, Barbara Mawhin and Luis Merino
FERM: A FEature-space Representation Measure for Improved Model Evaluation (paper, presentation)
Guyver Fu, Wenbo Ge and Jo Plested
*Accepted paper not included in workshop proceedings by author choice:
*Behavioral experiments for understanding catastrophic forgetting (paper, presentation)
Samuel Bell and Neil Lawrence
*Red Teaming Language Models with Language Models (paper, presentation)
Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese and Geoffrey Irving