The ART of Safety:

Workshop on Adversarial testing and Red-Teaming  for generative AI

Virtual on Nov 1st, 2023

Introducing the ART of Safety workshop, virtually co-located with AACL 2023!

A workshop on the promise and pitfalls of adversarial testing and red-teaming for safety issues in generative AI

The Data-centric AI initiative has been promoting the importance of systematically engineering the data used to build and evaluate AI systems. In this context, human input is crucial in creating such data to uncover the failures of these systems. Through human-centric methods for model testing, we can harness human creativity in uncovering long-tail issues and unknown unknowns for generative AI. 

Various red-teaming efforts [1, 2, 3] have surged in the context of generative AI as a process to find risks in these models. There are a few problems with the current red-teaming paradigm – first, definitions of safety are not shared across organizations, resulting in different non-aligned perspectives on normative concepts such as safe or unsafe; second, most of these efforts are conducted behind industry walls, resulting in a lack of transparency of procedures and participants, and third, the resulting datasets are not systematically shared as community open-source resources, which prevents from being able to reliably compare the safety of different systems. It is imminent to understand the error patterns in generative AI models and the downstream harms they might inflict on end users. But “safety” is not a universal concept – there are many different cultural and contextual aspects of interpreting whether a model is safe, in what domains and what blindspots are left. This is why “safety” is both an “ART” and a science

The ART of Safety (ARTS) workshop aligns with red teaming efforts at top venues this year, including the AI Village LLM Hackathon at DEFCON and the CRAFT hands-on session focused on text-to-image risks at FAccT2023 and extends them with focus its unique focus on diversity of community perspectives on encoding, evaluating, and establishing safety for generative AI. Towards this end, the workshop has two main goals: