Programme

This workshop will take place in Room 112 in the Pennsylvania Convention Centre, Philadelphia on 3rd March 2025.

Organiser’s Welcome

09:00 - 09:10

Description: The workshop opens with a welcome from the organisers, providing context and objectives for the day.

Speakers: Workshop organiser(s)

Invited Talk by Peter Mattson: Building an Ecosystem for Reliable, Low-Risk AI

09:10 - 09:50

Scope: How better benchmarks and a developing science of AI data supported by AI data standards and tools will enable AI products and services that deliver value

Description: This talk will explore how advancements in benchmarks and the development of a science of AI data—underpinned by robust standards and tools—can foster the creation of AI products and services that are both trustworthy and valuable. It will cover key topics, including the progress of AI as a transformative technology, the evolution of benchmarking practices, and the MLCommons approach exemplified by MLPerf, AILuminate, and MedPerf. It will also address the importance of AI data science and tooling, highlighting innovations like the MLCommons Croissant data format, and outline actionable steps for the AI community to build a secure and reliable AI ecosystem.

Speakers: Dr. Peter Mattson, Google Research & MLCommons

Deep Dive into AI Safety: Lora Aroyo & Gopal Ramchurn

09:50 - 10:30

Description: This session features two 20-minute presentations focusing on AI safety. Lora Aroyo will discuss methodologies for evaluating AI systems, emphasising the importance of diverse perspectives in data to ensure trustworthiness and reliability. Gopal Ramchurn will explore the safety and security of autonomous systems, addressing challenges in human-machine teaming and responsible AI deployment

Speakers:

Dr. Lora Aroyo, Google Research
Prof. Gopal Ramchurn, University of Southampton

Coffee break

10:30 - 11:00

Showcase: submitted work and posters

Panel 1: Safety Challenges in Multimodal AI

11:00 - 12:00

Description: A panel discussion featuring experts on the unique safety challenges posed by multimodal AI systems, such as bias, interpretability, and transparency.

Moderator: Prof. Hana Chockler - King's College London

Speakers:

Dr. Lora Aroyo - Research Scientist at Google DeepMind
Ankit Jain - Eng Manager of GenAI Safety at Meta
Natan Vidra - Founder/CEO of Anote
Ken Fricklas - Turaco Strategy - CEO
Marko Grobelnik - Josef Stefan Institute Artificial Intelligence Lab - Co-Lead

Lightning Talks – "LLMs and VLMs" (part I)

12:00 - 12:30

Description: Authors of 5 of the accepted papers will present brief lightning talks summarising their research contributions. Each presenter will have approximately 5 minutes to provide a concise overview of their work related to creating and improving datasets and benchmarks for AI safety. Topics range from theoretical approaches to practical implementations and evaluations of AI systems.

Papers:

DarkBench: Benchmarking Dark Patterns in Large Language Models
- Authors: Esben Kran, Hieu Minh Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, Mateusz Maria Jurewicz

HumanAgencyBench: Do Language Models Support Human Agency?
- Authors: Benjamin Sturgeon, Leo Hyams, Daniel Samuelson, Ethan Vorster, Jacob Haimes, Jacy Reese Anthis

Changing Answer Order Can Decrease MMLU Accuracy
- Authors: Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, Megan Ung

Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
- Authors: Neel Jay, Hieu Minh Nguyen, Hoang Trung Dung, Jacob Haimes

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
- Authors: Andrey Anurin, Jonathan Ng, Jason Hoelscher-Obermaier, Esben Kran

Lunch Break

12:30 - 14:00

Description: Attendees can enjoy lunch while engaging with poster presentations of contributed work. This session provides an excellent opportunity for networking and in-depth discussions with researchers about their projects. Posters from the accepted papers will be displayed, and authors will be available for questions.

Panel 2: Agentic AI: Future Challenges and Opportunities

14:00 - 15:00

Description: This panel addresses safety concerns related to agentic AI systems, focusing on autonomy, trustworthiness, drift, and transparency. The discussion aims to identify emerging risks and point toward potential innovative solutions.

Moderator: Prof. Elena Simperl - King's College London

Speakers:

Dr. Angelo Dalli - Chief Scientist and Co-Founder, UMNAI
Rajat Ghosh - Staff Data Scientist, Nutanix
Prof. Gopal Ramchurn - Professor of Artificial Intelligence, University of Southampton
Srija Chakraborty - Scientist, Universities Space Research Association (USRA)
Dr. Sean McGregor - Founding Director, Digital Safety Research Institute at the UL Research Institutes

Invited Talk by Professor Virginia Dignum: Is Safety the Future We Need for AI?

15:00 - 15:30

Description: This talk will focus on the future of AI safety, highlighting under-discussed risks and themes. It will provide a forward-looking perspective on the field, emphasising areas that require increased awareness and attention.

Speakers: Prof. Virginia Dignum - Umeå University

Coffee Break

15:30 - 16:00

Description: A second coffee break, featuring poster presentations of contributed work. Participants can network and engage with researchers during this session.

Lightning Talks – "Datasets and Benchmarks" (part II)

16:00 - 16:30

Papers:

Preference Poisoning Attacks on Reward Model Learning
- Authors: Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

Subversion Strategy Eval: Evaluating AI’s Stateless Strategic Capabilities Against Control Protocols
- Authors: Alex Troy Mallen, Charlie Griffin, Alessandro Abate, Buck Shlegeris

Data-Centric Safety and Ethical Measures for Data and AI Governance
- Author: Srija Chakraborty

ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection
- Authors: Delyan Boychev, Radostin Cholakov

Federated Unlearning via Subparameter Space Partitioning and Selective Freezing
- Authors: Krishna Yadav, Varala Nandu Swapnik, Kwok Tai Chui, Brij Bhooshan Gupta

Concluding Remarks

16:30 - 17:00

Description: The workshop will conclude with a summary of key insights and discussions from the day's sessions. Attendees will be invited to contribute their perspectives and discuss potential collaborative projects. Closing remarks will outline the next steps and encourage ongoing engagement within the community.

Speakers: Workshop organiser(s)

Page updated

Google Sites

Report abuse