Auditing AI models for Verified Deployment under Semantic Specifications

Auditing trained deep learning (DL) models prior to deployment is vital in preventing unintended consequences. One of the biggest challenges in auditing is in understanding how we can obtain human-interpretable specifications that are directly useful to the end-user. We address this challenge through a sequence of semantically-aligned unit tests, where each unit test verifies whether a predefined specification (e.g., accuracy over 95%) is satisfied with respect to controlled and semantically aligned variations in the input space (e.g., in face recognition, the angle relative to the camera). We perform these unit tests by directly verifying the semantically aligned variations in an interpretable latent space of a generative model. Our framework, AuditAI, bridges the gap between interpretable formal verification and scalability. With evaluations on four different datasets, covering images of towers, chest X-rays, human faces, and ImageNet classes, we show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.

Technical insight

We consider a typical machine learning production pipeline with three parties, the end-user of the deployed model, the verifier, and the designer of the model. The verifier plays the critical role of verifying whether the model from the designer satisfies the need of the end-user. For example, unit test 1 could be verifying whether a given face classification model maintains over 95% accuracy when the face angle is within d degrees, while unit test 2 could be checking under what lighting condition the model has over 86% accuracy. Once the verification is done, the end-user can then use the verified specification to determine whether to use the trained DL model during deployment.

We propose to audit deep learning models through a sequence of semantically-aligned unit tests, where each unit test verifies whether a pre-defined specification (e.g., accuracy over 95%) is satisfied with respect to controlled and semantically meaningful variations in the input space (e.g., the angle relative to the camera for a face image). Being semantically-aligned is critical for these unit tests to be useful for the end-user of the system to plan the model’s deployment. We address the gap between scalability and interpretability by proposing to verify the semantically meaningful variations directly in a semantically-aligned latent space of a generative model.


We show qualitative results for generated outputs corresponding to controlled variations in the latent space, respectively for images of class hen on ImageNet, chest X-ray images with the condition pneumonia, and human faces with different degrees of smile.

Key theoretical result The verifier can verify whether the trained model from the designer satisfies the specifications, by generating a proof of the same

Summary of results We show that AuditAI is applicable to training, verification, and deployment across diverse datasets: ImageNet, Chest X-Rays, LSUN, and Flicker Faces HQ (FFHQ). We theoretically show that AuditAI can verify whether a unit test is satisfied by generating a proof for verification based on bound propagation. For ImageNet, we show that AuditAI can train verifiably robust models which can tolerate 20% larger latent variations compared to pixel-based counterparts for the same overall verified accuracy of 88%. This translates to a 25% increase in pixel-space variations on average. The variations are measured as L2 distances in the latent and pixel-spaces respectively. The respective % increase in pixel-space variations that can be certified for Chest X-Rays, LSUN, and FFHQ are 22%, 19%, 24%.

Discussion and Limitations

In this paper we developed a framework for auditing of deep learning (DL) models. There are increasingly growing concerns about innate biases in the DL models that are deployed in a wide range of settings and there have been multiple news articles about the necessity for auditing DL models prior to deployment Feb 11, 2021 July 10, 2019. Our framework formalizes this audit problem which we believe is a step towards increasing safety and ethical use of DL models during deployment.

One of the limitations of AuditAI is that its interpretability is limited by that of the built-in generative model. While exciting progress has been made for generative models, we believe it is important to incorporate domain expertise to mitigate potential dataset biases and human error in both training and deployment. Currently, AuditAI doesn’t directly integrate human domain experts in the auditing pipeline, but indirectly uses domain expertise in the curation of the dataset used for creating the generative model. Although we have demonstrated AuditAI primarily for auditing computer vision classification models, we hope that this would pave the way for more sophisticated domain-dependent AI-auditing tools and frameworks in language modelling and decision-making applications.

Qualitative results for sample images with different ranges of epsilon

Brightness Test


$\epsilon = 1.5$

Expression Test

$\epsilon = 0.5$

$\epsilon = 1.0$

$\epsilon = 2.0$