One day workshop "Mathematical Fundations of AI"

When Uncertainty Quantification meets

Causal Modelling and Inference

March 18th, 2026

Amphi Durand Campus Jussieu Paris

This event is a joint event between the RT Uncertainty Quantification, the CAUSALITAI project and the AI clusters MIAI, SCAI and DATAIA

Registration is free but mandatory before March 12, 2026 : here

Plenary Speakers

Irène Balleli Centre Inria d'Université Côte d'Azur, équipe/projet EPIONE

Title: Voting, ensembling, and population-level causal discovery: what and how an expert audience can contribute?

Abstract: Discovering reliable cause-and-effect relationships from real-world data is an extremely complex and still open challenge. Existing Causal Discovery (CD) algorithms, even when proven theoretically identifiable, rely on strict assumptions that are rarely met in complex real-world scenarios, such as the functional form of the causal relationships, the data distribution family, and the causal sufficiency. As a result, the reliability of these algorithms can significantly drop, compromising the interpretability of the results and the trustworthiness of downstream decision-making. What if, instead of relying on a single CD expert and its partial understanding of the true underlying causal mechanism, we consulted an audience of experts? In this talk, I will introduce and discuss three main strategies for achieving expert consensus on causal discovery: voting, ensembling, and population-based analysis. I will highlight the level of additional information that each of the considered strategies can provide compared to a traditional single-expert-based approach, and outline how such strategies can be implemented effectively, starting with the communication bottleneck between experts. I will present some results from controlled simulation studies and a real-case application on lung cancer genetic disruptions, which will demonstrate the effectiveness of an expert audience in reinforcing the strengths of each expert while mitigating their uncertainties.

Victor Elvira University of Edinburgh

Title: Graph-structured state-space models beyond linear dynamics

Abstract: Modelling and inference in multivariate time series are central problems in statistics, signal processing, and machine learning. A recurring challenge is to understand and represent directed relationships between components of a dynamical system, either at the level of observed signals or latent states. Graphical modeling combined with sparsity constraints provides a natural language to encode such structure, limit parameter growth, and improve interpretability. In this talk, we adopt the perspective that state-space models can be interpreted as graph-structured dynamical systems, where edges encode dependencies in the latent evolution. We first briefly revisit the linear-Gaussian setting, where transition operators and noise covariances induce sparse directed graphs. We then focus on nonlinear extensions, showing how this graph-based view can be lifted beyond linear dynamics using differentiable probabilistic models. In particular, we introduce GraphGrad, a framework that enables learning graph-structured latent dynamics in nonlinear state-space models via gradient-based inference. This approach preserves interpretability while extending graph-based system identification to settings with nonlinear dynamics and complex observation models.

Aurore Lomet CEA Saclay LIAD

Title : Scaling Causal AI for Seismic FEM Simulation

Abstract: This presentation addresses the scalability of a causal discovery approach for multivariate time series in the context of seismic data assimilation. The proposed approach relies on kernel-based conditional independence tests, in particular the Hilbert–Schmidt Independence Criterion (HSIC). While dependence measures such as HSIC are commonly used in uncertainty quantification to quantify input–output influence, they are used here for independence testing to infer directed temporal graphs from sensor data while limiting distributional assumptions.

Since kernel-based methods require the construction of Gram matrices that limit scalability for long time series, a Random Fourier Features approximation is employed to reduce computational cost. The inference procedure is also ported to GPU to support high-performance execution.

Qingyuan Zhao University of Cambridge

Title: A counterfactual perspective of heritability, explainability, and ANOVA

Abstract: Existing tools for explaining complex models and systems are associational rather than causal and do not provide mechanistic understanding. Motivated by the concept of genetic heritability in twin studies, this talk will introduce a new notion called counterfactual explainability for causal attribution. This can be viewed as an extension of global sensitivity analysis (functional ANOVA and Sobol’s indices), which assumes independent explanatory variables, to dependent explanatory variables whose causal relationship can be described by a directed acyclic graph. The new notion will be illustrated using several artificial and real-world examples. This talk is based on joint works with Zijun Gao, Haochen Lei, and Hongyuan Cao.

Short Talks

Agathe Fernandes Machado UQAM, Montreal (Quebec, Canada)

Title: Causal Mediation Analysis via Sequential Transport to Assess Counterfactual Fairness

Abstract: Algorithmic fairness refers to a set of principles and techniques aimed at ensuring that the decisions produced by an algorithm are fair and non-discriminatory toward all users, regardless of personal characteristics such as gender, ethnicity, or other so-called sensitive attributes. Its assessment can be conducted at the individual level by focusing on a specific individual from a minority group and asking counterfactual questions such as: “What would this woman’s salary be if she were a man?” To evaluate the unfairness of a machine learning model, we adopt the notion of Counterfactual Fairness proposed by Kusner et al. (2017). We introduce a distributional framework for causal mediation analysis based on optimal transport (OT) and its sequential extension along a mediator Directed Acyclic Graph (DAG), in which the sensitive attribute corresponds to the treatment variable. Rather than relying on cross-world structural counterfactuals, we construct mediator counterfactuals in a mutatis mutandis sense: mediators are modified only as necessary to align an individual with the distribution under the alternative treatment, while respecting the causal dependencies among mediators. Sequential transport (ST) builds these counterfactuals by applying univariate or conditional OT maps following a topological order of the mediator DAG, and naturally extends to categorical mediators through adapted transport techniques on the probability simplex. Finally, we discuss how uncertainty in the causal graph propagates to the transport maps and may lead to unethical fairwashing.

Quentin Clairon Université de Bordeaux

Title: Variational autoencoder for inference of nonlinear mixed effect models based on ordinary differential equations

Abstract: We propose a variational autoencoder approach for identifiability-aware parameter estimation in nonlinear mixed-effects models based on ordinary differential equations using longitudinal data from multiple subjects. In moderate dimensions, likelihood-based inference via the stochastic approximation EM algorithm (SAEM) is widely used, but it relies on Markov Chain Monte-Carlo (MCMC) to approximate subject-specific posteriors. As model complexity increases or observations per subject are sparse and irregular, performance often deteriorates due to a complex, multimodal likelihood surface as well as MCMC convergence difficulties. We instead estimate parameters by maximizing the evidence lower bound (ELBO), a regularized surrogate for the marginal likelihood. A shared encoder amortizes inference of subject-specific random effects by avoiding per-subject optimization and the use of MCMC. Beyond pointwise estimation, we quantify parameter uncertainty using an observed-information-based variance estimator and verify that practical identifiability of the model parameters is not compromised by nuisance parameters introduced in the encoder. We evaluated the method on three simulation case studies, representing estimation scenario of increasing complexity, and on a real-world antibody kinetics dataset, compared against SAEM baselines.

Preben Ness Simula Research Laboratory (Oslo, Norway)

Title : Using causal ideas in deep neural networks: a practical survey and taxonomy

Abstract :Traditional neural networks are extremely good at learning statistical patterns in data, but often fail when test data is distributed differently from the model's training data. This problem of generalisation is a huge obstacle when using neural network models in safety-critical domains such as healthcare and self-driving cars.

The ideal mathematical solution is to learn the underlying invariant structural causal model that generated the observed data - what you might refer to as the data-generating process from a representation learning perspective. But this is provably impossible from observations alone! Nonetheless, over the past few years, there has been a multitude of approaches proposed for so-called "causal neural networks", which incorporate ideas from causality to improve generalisation capabilities.

In this talk, I will present the findings of a recent survey paper of mine, describing and categorising the different approaches that have been tried - and what benefits and challenges they each bring. The talk is aimed both at those already familiar with causality and those who work with deep neural networks more generally.

Margaux Zaffran Institut Mathématiques d'Orsay, équipe/projet Inria Celeste

Title : Momentum Smooths the Path to Gradient Equilibrium

Abstract : Online gradient descent has recently been shown to satisfy gradient equilibrium for a broad class of loss functions, including quantile loss and squared loss. This means that the average of the gradients of the losses along the sequence of estimates converges to zero, a property that allows for quantile calibration and debiasing of predictions, among other useful properties of statistical flavor. A shortcoming of online gradient descent when optimized for gradient equilibrium is that the sequence of estimates is jagged, leading to volatile paths. In this work, we propose generalized momentum method, in the form of weighting of past gradients, as a broader algorithmic class with guarantees to smoothly postprocess (e.g., calibrate or debias) predictions from black-box algorithms, yielding estimates that are more meaningful in practice. We prove it achieves gradient equilibrium at the same convergence rates and under similar sets of assumptions as plain online gradient descent, all the while producing smoother paths that preserve the original signal amplitude. Of particular importance are the consequences for sequential decision-making, where more stable paths translate to less variability in statistical applications. These theoretical insights are corroborated by real-data experiments, showcasing the benefits of adding momentum.

Schedule

Wednesday 18 March 2026 (program to be fixed)

- 9:00–10:00 : Talk of I. Balelli
- 10:00-10:30 : Coffee Break
- 10:30-11:00 : Talk of M. Zaffran
- 11:00-11:30 : Talk of A. Fernandes Machado
- 11:30-12:30 : Talk of A. Lomet
- 12:30-13:30 : Lunch Break
- 13:30-14:30 : Talk of Q. Zhao
- 14:30-15:00 : Talk of P. Ness
- 15:00-15:30 : Talk of Z. Li
- 15:30-16:00 : Coffee Break
- 16:00-17:00 : Talk of V. Elvira

Scientific comitee

Marianne Clausel, Emilie Chouzenoux, Emilie Devijver and Clémentine Prieur

Page updated

Google Sites

Report abuse