Program

To be announced as talks are proposed: we currently have enough talk proposal to fill up the day, but you can still participate by filling the form here

9h30-10h: Margot Boyer (CNAM), "Fast SDP certification of adversarial robustness : towards large multiclass datasets". Slides here
10h-10h30: Coffee break
10h30-11h: Gianni Franchi (ENSTA), "Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation" Slides here
11h-12h: invited talk by Fabio Cuzzolin, "Towards an epistemic generative AI?" Slides here
12h-13h30: lunch
13h30-14h: Loïc Adam (ISAE-ENSMA, Poitiers), "Handling inconsistency caused by an inadequate preferential model choice in uncertain preferences elicitation" Slides here
14h-14h30: Sinha Aman (Lorraine University), "Your Model is Overconfident, and Other Lies We Tell Ourselves" Slides here
14h30-15h: Arthur Roblin (ASNR), "Neural Network Uncertainty Quantification for Reliable Airborne Radioactivity Monitoring" Slides here
15h-15H30: Thomas George (Orange), "Statistical diagnostic measures for deep learning"
15h30-16h: Coffee break
16h-16h30: Rémi Kazmierczak (ENSTA), "Benchmarking xai explanations with human-aligned evaluations" Slides here
16h30-17h: Abdelrahman Sayed Ibrahim (Université Gustave Eiffel), "Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification" Slides here
17h-17h30: Arthur Hoarau (CentraleSupelec, LORIA), "Robust Explanations Through Uncertainty Decomposition" Slides here

Keynote talk by Fabio Cuzzolin, Professor at Oxford Brookes

Title: Towards an epistemic generative AI? Slides here

Abstract: The Epistemic AI is an approach which proposes the use of second-order uncertainty measures for quantifying epistemic uncertainty in artificial intelligence. A mathematical framework which generalises the concept of random variable, random sets, for instance, enable a more flexible and expressive approach to uncertainty modeling. We discuss ways in which the random sets and credal sets formalisms can model classification uncertainty over both the target and parameter spaces of a machine learning model (e.g., a neural network), outperforming Bayesian, ensemble and evidential baselines. We show how the principle can be extended to generative AI, in particular large language models less prone to hallucination, as well as diffusion processes and generative adversarial networks. Exciting applications to large concept models and visual language models as well as neural operators, scientific machine learning and neurosymbolic reasoning are discussed.

A detailed schedule with hours will appear here soon.

The following talks have been proposed for the moment:

Margot Boyer (CNAM), "Fast SDP certification of adversarial robustness : towards large multiclass datasets": We present a new quadratic model for the certification problem in adversarial robustness which simultaneously accounts for all possible target classes. Building on this model, we propose a novel semidefinite programming (SDP) relaxation for incomplete verification. A key advantage of our approach is that it certifies robustness in a single optimization, avoiding the need for a separate resolution per class. We improve our model with specific cuts and propose a new neuron pruning technique for scalability. Slides here
(confirmed) Abdelrahman Sayed Ibrahim (Université Gustave Eiffel), "Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification": A neural ODE is a machine learning model that is commonly described as a continuous depth generalization of a ResNet with a single residual block. By establishing a formal bound on the approximation error between the two models, we show that one model can serve as a verification proxy for the other, ensuring safety properties hold across both without redundant verification. Slides here
(confirmed) Sinha Aman (Lorraine University), "Your Model is Overconfident, and Other Lies We Tell Ourselves" When annotators label language data, they don’t always agree -- and that’s not just noise, it’s a fundamental part of how language works. Some examples are simply harder to classify because they’re inherently ambiguous, yet this challenge is often overlooked when evaluating NLP models. In this presentation, we dive into what makes certain data points tricky by exploring different ways to measure intrinsic difficulty -- looking at annotator disagreements, model confidence, and training dynamics. By disentangling these dimensions of uncertainty, we aim to enhance our understanding of data complexity and its implications for assessing and improving NLP models. Slides here
(confirmed) Loïc Adam (ISAE-ENSMA, Poitiers), "Handling inconsistency caused by an inadequate preferential model choice in uncertain preferences elicitation": In preference elicitation, the preferences of a user are collected in order to recommend one alternative. The user’s preferences are assumed to follow a specific preference model, modelled by a criteria aggregation function, which is chosen in advance by an expert. In this talk, we show what happens when an inadequate family function is picked, and how possibility theory and Exploration/Exploitation dilemma help to detect and solve such an issue. Slides here
(confirmed) Arthur Roblin (ASNR), "Neural Network Uncertainty Quantification for Reliable Airborne Radioactivity Monitoring": The limitations of classical algorithms motivate the use of deep learning for airborne radioactivity monitoring. However, the lack of transparency and reliability of such models remains a major concern in critical applications. In this work, we propose to quantify the predictive uncertainty of neural networks, enabling more trustworthy decisions by explicitly modeling the confidence associated with each prediction. Slides here
(confirmed) Arthur Hoarau (CentraleSupelec, LORIA), "Robust Explanations Through Uncertainty Decomposition": Les récents progrès dans le domaine de l'apprentissage automatique ont mis en évidence le besoin de transparence dans les prédictions des modèles, d'autant plus que l'interprétabilité diminue lors de l'utilisation d'architectures de plus en plus complexes. Nous proposons d'exploiter l'incertitude des prédictions comme une approche complémentaire aux méthodes classiques d'explicabilité. Plus précisément, nous distinguons l'incertitude aléatoire et l'incertitude épistémique pour guider la sélection des explications appropriées, nous permettant de proposer un cadre de choix d'explications fondées sur la quantification de l'incertitude. L'incertitude épistémique sert de critère de rejet pour les explications non fiables et, en soi, transcrit un entrainement insuffisant (nouvelle forme d'explication). L'incertitude aléatoire permet de choisir entre les explications par importance de variables et les explications contrefactuelles. Slides here
(confirmed) Thomas George (Orange), "Statistical diagnostic measures for deep learning": Diagnostic measures are popular tools in statistics that estimate the influence of each individual example on a given quantity, such as model fit, parameter values, or predictions on other examples. These tools provide a principled methodology to identifying important training examples and outliers, or estimating uncertainty at test time. Among these tools, infinitesimal influence functions have already been successfully applied to deep networks (Koh & Liang, 2017). In this ongoing work, we aim to promote the use of other common diagnostic measures for use in deep learning by leveraging recent results in deep network linearization and efficient approximate Fisher information matrices.
(confirmed) Gianni Franchi (ENSTA), "Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation": Uncertainty quantification in text-to-image (T2I) generative models is crucial for understanding model behavior and improving output reliability. In this paper, we are the first to quantify and evaluate the uncertainty of T2I models with respect to the prompt. Alongside adapting existing approaches designed to measure uncertainty in the image space, we also introduce Prompt-based UNCertainty Estimation for T2I models (PUNC), a novel method leveraging Large Vision-Language Models (LVLMs) to better address uncertainties arising from the semantics of the prompt and generated images. PUNC utilizes a LVLM to caption a generated image, and then compares the caption with the original prompt in the more semantically meaningful text space. PUNC also enables the disentanglement of both aleatoric and epistemic uncertainties via precision and recall, which image-space approaches are unable to do. Extensive experiments demonstrate that PUNC outperforms state-of-the-art uncertainty estimation techniques across various settings. Uncertainty quantification in text-to-image generation models can be used on various applications including bias detection, copyright protection, and OOD detection. We also introduce a comprehensive dataset of text prompts and generation pairs to foster further research in uncertainty quantification for generative models. Our findings illustrate that PUNC not only achieves competitive performance but also enables novel applications in evaluating and improving the trustworthiness of text-to-image models. Slides here
(confirmed) Rémi Kazmierczak (ENSTA), "Benchmarking xai explanations with human-aligned evaluations": we introduce PASTA (Perceptual Assessment System for explanaTion of Artificial Intelligence), a novel human-centric framework for evaluating eXplainable AI (XAI) techniques in computer vision. Our first contribution is the creation of the PASTA-dataset, the first large-scale benchmark that spans a diverse set of models and both saliency-based and concept-based explanation methods. This dataset enables robust, comparative analysis of XAI techniques based on human judgment. Our second contribution is an automated, data-driven benchmark that predicts human preferences using the PASTA-dataset. This scoring called PASTA-score offers scalable, reliable, and consistent evaluation aligned with human perception. Additionally, our benchmark allows for comparisons between explanations across different modalities, an aspect previously unaddressed. We then propose to apply our scoring method to probe the interpretability of existing models and to build more human-interpretable XAI methods. Slides here

Page updated

Google Sites

Report abuse