This workshop will take place in Room 112 in the Pennsylvania Convention Centre, Philadelphia on 3rd March 2025.
09:00 - 09:10
Description: The workshop opens with a welcome from the organisers, providing context and objectives for the day.
Speakers: Workshop organiser(s)
09:10 - 09:50
Scope: How better benchmarks and a developing science of AI data supported by AI data standards and tools will enable AI products and services that deliver value
Description: This talk will explore how advancements in benchmarks and the development of a science of AI data—underpinned by robust standards and tools—can foster the creation of AI products and services that are both trustworthy and valuable. It will cover key topics, including the progress of AI as a transformative technology, the evolution of benchmarking practices, and the MLCommons approach exemplified by MLPerf, AILuminate, and MedPerf. It will also address the importance of AI data science and tooling, highlighting innovations like the MLCommons Croissant data format, and outline actionable steps for the AI community to build a secure and reliable AI ecosystem.
Speakers: Dr. Peter Mattson, Google Research & MLCommons
09:50 - 10:30
Description: This session features two 20-minute presentations focusing on AI safety. Lora Aroyo will discuss methodologies for evaluating AI systems, emphasising the importance of diverse perspectives in data to ensure trustworthiness and reliability. Gopal Ramchurn will explore the safety and security of autonomous systems, addressing challenges in human-machine teaming and responsible AI deployment
Speakers:
Dr. Lora Aroyo, Google Research
Prof. Gopal Ramchurn, University of Southampton
Coffee break
10:30 - 11:00
Showcase: submitted work and posters
11:00 - 12:00
Description: A panel discussion featuring experts on the unique safety challenges posed by multimodal AI systems, such as bias, interpretability, and transparency.
Moderator: Prof. Hana Chockler - King's College London
Speakers:
Dr. Lora Aroyo - Research Scientist at Google DeepMind
Ankit Jain - Eng Manager of GenAI Safety at Meta
Natan Vidra - Founder/CEO of Anote
Ken Fricklas - Turaco Strategy - CEO
Marko Grobelnik - Josef Stefan Institute Artificial Intelligence Lab - Co-Lead
12:00 - 12:30
Description: Authors of 5 of the accepted papers will present brief lightning talks summarising their research contributions. Each presenter will have approximately 5 minutes to provide a concise overview of their work related to creating and improving datasets and benchmarks for AI safety. Topics range from theoretical approaches to practical implementations and evaluations of AI systems.
Papers:
DarkBench: Benchmarking Dark Patterns in Large Language Models
Authors: Esben Kran, Hieu Minh Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, Mateusz Maria Jurewicz
HumanAgencyBench: Do Language Models Support Human Agency?
Authors: Benjamin Sturgeon, Leo Hyams, Daniel Samuelson, Ethan Vorster, Jacob Haimes, Jacy Reese Anthis
Changing Answer Order Can Decrease MMLU Accuracy
Authors: Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, Megan Ung
Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
Authors: Neel Jay, Hieu Minh Nguyen, Hoang Trung Dung, Jacob Haimes
Authors: Andrey Anurin, Jonathan Ng, Jason Hoelscher-Obermaier, Esben Kran
12:30 - 14:00
Description: Attendees can enjoy lunch while engaging with poster presentations of contributed work. This session provides an excellent opportunity for networking and in-depth discussions with researchers about their projects. Posters from the accepted papers will be displayed, and authors will be available for questions.
14:00 - 15:00
Description: This panel addresses safety concerns related to agentic AI systems, focusing on autonomy, trustworthiness, drift, and transparency. The discussion aims to identify emerging risks and point toward potential innovative solutions.
Moderator: Prof. Elena Simperl - King's College London
Speakers:
Dr. Angelo Dalli - Chief Scientist and Co-Founder, UMNAI
Rajat Ghosh - Staff Data Scientist, Nutanix
Prof. Gopal Ramchurn - Professor of Artificial Intelligence, University of Southampton
Srija Chakraborty - Scientist, Universities Space Research Association (USRA)
Dr. Sean McGregor - Founding Director, Digital Safety Research Institute at the UL Research Institutes
15:00 - 15:30
Description: This talk will focus on the future of AI safety, highlighting under-discussed risks and themes. It will provide a forward-looking perspective on the field, emphasising areas that require increased awareness and attention.
Speakers: Prof. Virginia Dignum - Umeå University
15:30 - 16:00
Description: A second coffee break, featuring poster presentations of contributed work. Participants can network and engage with researchers during this session.
16:00 - 16:30
Description: Authors of 5 of the accepted papers will present brief lightning talks summarising their research contributions. Each presenter will have approximately 5 minutes to provide a concise overview of their work related to creating and improving datasets and benchmarks for AI safety. Topics range from theoretical approaches to practical implementations and evaluations of AI systems.
Papers:
Preference Poisoning Attacks on Reward Model Learning
Authors: Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik
Subversion Strategy Eval: Evaluating AI’s Stateless Strategic Capabilities Against Control Protocols
Authors: Alex Troy Mallen, Charlie Griffin, Alessandro Abate, Buck Shlegeris
Data-Centric Safety and Ethical Measures for Data and AI Governance
Author: Srija Chakraborty
ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection
Authors: Delyan Boychev, Radostin Cholakov
Federated Unlearning via Subparameter Space Partitioning and Selective Freezing
Authors: Krishna Yadav, Varala Nandu Swapnik, Kwok Tai Chui, Brij Bhooshan Gupta
16:30 - 17:00
Description: The workshop will conclude with a summary of key insights and discussions from the day's sessions. Attendees will be invited to contribute their perspectives and discuss potential collaborative projects. Closing remarks will outline the next steps and encourage ongoing engagement within the community.
Speakers: Workshop organiser(s)