Program

Schedule

Thursday:

Friday:

Clément Benard: Interpretability of causal treatment effects via variable importance for causal forests

Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in treatment effect heterogeneity, which is a strong practical limitation. In this article, we develop a new importance variable algorithm for causal forests, to quantify the impact of each input on the heterogeneity of treatment effects. The proposed approach is inspired from the drop and relearn principle, widely used for regression problems. Importantly, we show how to handle the case where the forest is retrained without a confounding variable. If the confounder is not involved in the treatment effect heterogeneity, the local centering step enforces consistency of the importance measure. Otherwise, when a confounder also impacts heterogeneity, we introduce a corrective term in the retrained causal forest to recover consistency. Additionally, experiments on simulated, semi-synthetic, and real data show the good performance of our importance measure, which outperforms competitors on several test cases. Experiments also show that our approach can be efficiently extended to groups of variables, providing key insights in practice.

Gabriele Ciravegna: XAI Is Dead, Long Live C-XAI: A Paradigm Shift in Explainable Artificial Intelligence

Recent literature is highlighting the inherent challenges and limitations of standard XAI methods, including gradient-based and model-agnostic techniques. Among other reasons, this is due to the fact that all standard XAI methods provide explanations at the feature level, which may not be meaningful particularly for non-expert users. Merely understanding where the network is looking may not be sufficient to explain what the network is seeing in a given input. To really achieve this goal, there is a growing consensus that XAI techniques should provide explanations in terms of higher-level attributes commonly referred to as concepts. Concept-based explanations offer a compelling alternative by providing a more holistic view of the model’s inner workings. By explaining the model’s predictions in terms of human-understandable concepts or abstractions, concept-based explanations resemble better human reasoning and explanations than standard post-hoc explanation methods. They not only enhances the transparency and interpretability of the model but also empowers users to gain deeper insights into the underlying reasoning, helping them to detect model biases and improve the classification model. 

Matthieu Cord: Object-Aware Counterfactual Explanations

In today's context, deep vision models are extensively applied in safety-critical applications, such as autonomous driving, and the need for explainability of these models has become a significant concern. Among explanation methods, counterfactual explanations aim to find minimal and interpretable changes to the input image that would also change the output of the model to be explained. These explanations assist users in identifying the primary factors that influence the model's decisions. For simple images, such as low-resolution face portraits, synthesizing visual counterfactual explanations has recently been proposed as a way to uncover the decision mechanisms of a trained classification model.  More challenging is the problem of producing counterfactual explanations for high-quality images and complex scenes.

In this presentation, I will introduce an object-centric framework for the generation of visual counterfactual explanations. Our approach, inspired by recent developments in generative modeling, involves embedding the input image into a structured latent space that streamlines manipulations at the object level. This allows end-users to have control over which aspects, such as spatial object displacements and style adjustments, are explored during the generation of counterfactual explanations for complex scenes. I will present results on counterfactual explanation benchmarks for driving scenes classification and discuss a user study that measures the usefulness of counterfactual explanations in understanding a decision model.

Tristan Gomez: Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification

Deep neural networks, while powerful for image classification, often operate as "black boxes," complicating the understanding of their decision-making processes. Various explanation methods, particularly those generating saliency maps, aim to address this challenge. However, the inconsistency issues of faithfulness metrics hinder reliable benchmarking of explanation methods. This paper employs an approach inspired by psychometrics, utilizing Krippendorf's alpha to quantify the benchmark reliability of post-hoc methods in image classification. The study proposes model training modifications, including feeding perturbed samples and employing focal loss, to enhance robustness and calibration. Empirical evaluations demonstrate significant improvements in benchmark reliability across metrics, datasets, and post-hoc methods. This pioneering work establishes a foundation for more reliable evaluation practices in the realm of post-hoc explanation methods, emphasizing the importance of model robustness in the assessment process.

Gianluigi Lopardo: Faithful and Robust Local Interpretability for Textual Predictions 

Interpretability is crucial as machine learning models find applications in critical and sensitive domains. Local and model-agnostic methods have gained popularity for explaining predictions without knowing the underlying model. However, the explanation generation process can be complex, lacking solid mathematical foundations, and their performance is not guaranteed even on simple models. In this talk, we present FRED (Faithful and Robust Explainer for textual Documents), a novel method for interpreting prediction over text. FRED identifies key words in a document that significantly impact the prediction when removed. We establish the reliability of FRED through formal definitions and theoretical analyses on interpretable classifiers.  Additionally, our empirical evaluation against state-of-the-art methods demonstrates the effectiveness of FRED in providing insights into text models. 

Tobias Leemann: On the Identifiability of Post-Hoc Conceptual Explanations

Feature attribution methods for computer vision models reason on a pixel-level, which been shown to be hardly interpretable to humans. As a result, interest in understanding and factorizing learned embedding spaces through higher-level, conceptual explanations is steadily growing. These methods provide post-hoc explanations for decisions by complex models in terms of interpretable concepts like object shape or color. However, the definitions of the concepts are often vague and fuzzy. When no human concept annotations are available, concepts additionally need to be discovered in the data without supervision. To provide a formal grounding to the problem, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. In this talk, we will study the conditions under which concepts can be uniquely identified. For stochastically dependent concepts (e.g, “car” and “street”) we propose two novel approaches that exploit functional compositionality properties of image-generating processes. Our results highlight the strict conditions under which reliable concept discovery without human labels can be guaranteed, posing a fundamental challenge for the domain.

Francesca Naretto: Exploring the landscape of Explainable AI and Privacy issues

The growing adoption of black-box models in Artificial Intelligence (AI) has increased the demand for explanation methods to understand the inner workings of these opaque models. For this reason, the field of Explainable AI is a practical necessity to uncover potential biases and address ethical concerns. Thanks to the widespread interest in this topic, today's literature offers a plethora of methods, each one with unique characteristics. In this talk, we will discover the most prominent explanation techniques across various data domains, including tabular data, images, and text. In particular, we will compare different explanations through visualizations as well as quantitative metrics, shedding light on their relative strengths and weaknesses. After having established the landscape of Explainable AI, our exploration will extend to the critical challenges we are facing now, with a particular focus on privacy concerns.

Martin Pawelczyk: On the Privacy Risks of Algorithmic Recourse

As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals. While such recourses can be immensely beneficial to affected individuals, potential adversaries could also exploit these recourses to compromise privacy. In this work, we make the first attempt at investigating if and how an adversary can leverage recourses to infer private information about the underlying model's training data. To this end, we propose a series of novel membership inference attacks which leverage algorithmic recourse. More specifically, we extend the prior literature on membership inference attacks to the recourse setting by leveraging the distances between data instances and their corresponding counterfactuals output by state-of-the-art recourse methods. Extensive experimentation with real world and synthetic datasets demonstrates significant privacy leakage through recourses. Our work establishes unintended privacy leakage as an important risk in the widespread adoption of recourse methods.

Ronan Sicre: Visual interpretability, saliency maps and more

We will first review previous work on visual interpretability methods and particularly saliency-based methods. We will then present Opti-CAM, a CAM-based method that optimizes a masking objective per instance. The resulting saliency map highlights the important area of an image regarding the decision of a trained image classification network. Then, we will review some works around interpretable image classification using parts or prototype-based architectures.