Topics
The ART of Safety workshop addresses the aforementioned weaknesses by focusing on under-explored areas such as long-tail issues, unknown unknowns, and problems related to accurate representations of sensitive characteristics.Â
We take a particular interest in papers that discuss novel approaches to capturing diverse perspectives on safety and safety evaluations of generative AI models with end users, experts or crowd workers that include end users. All perspectives are welcome that fit the specifics and breadth of the AACL community.
Relevant topics include (but are not limited to):
Best practices in adversarial testing for generative models;
Effectiveness of adversarial approaches across language and vision-language models;
FATE-related concerns on current red-teaming and adversarial testing processes;
AI blindspots and unknown-unknowns in adversarial testing;
Generalizability of red teaming and adversarial testing approaches;
Assessing quality of data collected through adversarial approaches;
Adversarial testing for improving AI alignment;
Impact assessment of adversarial approaches on safety evaluation.