With the increasing adoption of machine learning models in high-stakes applications, rigorous audits of model behavior have assumed paramount importance. However, traditional auditing methods fall short of being truly experimental, as they rely on wild-caught observational data that has been manually labeled. Enter generative techniques, which have recently shown impressive capabilities in automatically generating and labeling high-quality synthetic data at scale. Critically, many such methods allow for the isolation and manipulation of specific attributes of interest, paving the path towards robust experimental analysis.
This workshop is dedicated to exploring techniques for auditing the behavior of machine learning models – including (but not limited) to performance, bias, and failure modes – by the controlled synthesis (via generation or simulation) of data. Of special interest are algorithms for generating data (images, text, audio, etc.) and benchmarking that provide reliable insights into model behavior by minimizing the impact of potential confounders. We also welcome work on the broader topic of using synthetic or quasi-synthetic data for model debugging, broadly construed, with the goal of providing a venue for interdisciplinary exchange of ideas on this emerging topic.
For more updates, follow us on X @emacscvpr25!
Associate Professor, Princeton University
Assistant Professor, Stanford University
Associate Professor, UIUC
Staff Research Scientist, Google Responsible AI
Salesforce AI Research
Georgia Tech
Caltech
Rice University
NVIDIA Research
Rice University
Stanford University
Rice University