The ART of Safety:
Workshop on Adversarial testing and Red-Teaming for generative AI
Virtual on Nov 1st, 2023
Introducing the ART of Safety workshop, virtually co-located with AACL 2023!
A workshop on the promise and pitfalls of adversarial testing and red-teaming for safety issues in generative AI
The Data-centric AI initiative has been promoting the importance of systematically engineering the data used to build and evaluate AI systems. In this context, human input is crucial in creating such data to uncover the failures of these systems. Through human-centric methods for model testing, we can harness human creativity in uncovering long-tail issues and unknown unknowns for generative AI.
Various red-teaming efforts [1, 2, 3] have surged in the context of generative AI as a process to find risks in these models. There are a few problems with the current red-teaming paradigm – first, definitions of safety are not shared across organizations, resulting in different non-aligned perspectives on normative concepts such as safe or unsafe; second, most of these efforts are conducted behind industry walls, resulting in a lack of transparency of procedures and participants, and third, the resulting datasets are not systematically shared as community open-source resources, which prevents from being able to reliably compare the safety of different systems. It is imminent to understand the error patterns in generative AI models and the downstream harms they might inflict on end users. But “safety” is not a universal concept – there are many different cultural and contextual aspects of interpreting whether a model is safe, in what domains and what blindspots are left. This is why “safety” is both an “ART” and a science.
The ART of Safety (ARTS) workshop aligns with red teaming efforts at top venues this year, including the AI Village LLM Hackathon at DEFCON and the CRAFT hands-on session focused on text-to-image risks at FAccT2023 and extends them with focus its unique focus on diversity of community perspectives on encoding, evaluating, and establishing safety for generative AI. Towards this end, the workshop has two main goals:
Collect and compare red-teaming results for Text-2-Image (T2I) models. We welcome both empirical and position papers discussing experiences and results from using the Adversarial Nibbler challenge.
Provide an overview of current approaches, methods and techniques for red-teaming and adversarial testing. We welcome both empirical and position papers discussing various workshop topics including data quality, safety evaluations, safety ground truth, etc.
This workshop builds on the success of the first Data-Centric AI Workshop at NeurIPS2021 as well as series of workshops focusing on the role of data in AI, e.g. Data-centric Machine Learning Research (DMLR) @ ICML 2023, Machine Learning for Data – Automated Creation, Privacy, Bias @ ICML 2021, Data Excellence Workshop (DEW) @ HCOMP2020, Economics of Privacy and Data Labor @ ICML 2020, Rigorous Evaluation of AI Systems (Meta-Eval) @ AAAI 2020 and (REAIS) @ HCOMP 2020 and 2019, HAMLETS, DADC @ NAACL among others.