Paris 2024

We organize a one-day meeting the 15th of January 2024 on following topics: 

stochastic optimal control, reinforcement learning and model uncertainty 

registration (mandatory)


Schedule:


Location of the workshop: 

Amphi Turing, Bâtiment Sophie Germain - université Paris Cité, Pl. Aurélie Nemours, 75013 Paris


Titles and abstracts:

Samuel Cohen

Title: Convergence of neural net approximators for PDEs

Abstract: In this talk, we will consider the use of neural networks as approximators for various partial differential equations. We will consider variations on the `deep Galerkin method’ of Sirignano and Spiliopolous (closely related to PINNs methods), and show that in the wide-network limit, gradient descent methods are guaranteed to give approximations of the true solutions of many PDEs.

Based on joint work with Deqing Jiang and Justin Sirignano


Lorenzo Croissant

Title: Reinforcement Learning in near-continuous time for continuous state-action spaces

Abstract: We consider the reinforcement learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for systems in which interactions occur at a high frequency, if not in continuous time, or those whose state spaces are large if not inherently continuous. Perhaps the only exception is the linear quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure.

This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency 1/a which captures arbitrary time scales from discrete (a=1) to continuous time (a tends to 0). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on R^d.

We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning by extending the eluder dimension framework and propose an approximate planning method based on a diffusive limit (a tends to 0) approximation of the jump process. Overall, our algorithm enjoys a regret of order O(T^(1/2)) or O(a^(1/2)T +T^(1/2)) with the approximate planning. As the frequency of interactions blows up, the approximation error a^(1/2)T vanishes, showing that O(T^(1/2)) is attainable in near-continuous time.


Alexander Merkel

Title: Optimal adaptive control with separable drift uncertainty

Abstract: We consider a problem of stochastic optimal control with separable drift uncertainty in strong formulation on a finite horizon. The drift coefficient of the state $Y^{u}$ is multiplicatively influenced by an unknown random variable $\lambda$, while admissible controls $u$ are required to be adapted to the observation filtration. Choosing a control actively influences the state and information acquisition simultaneously and comes with a learning effect.

The problem, initially non-Markovian, is embedded into a higher-dimensional Markovian, full information control problem with control-dependent filtration and noise. To that problem, we apply the stochastic Perron method to characterize the value function as the unique viscosity solution to the HJB equation, explicitly construct $\varepsilon$-optimal controls and show that the values of strong and weak formulations agree. Numerical illustrations show a significant difference between the adaptive control and the certainty equivalence control.

Joint work with Christoph Belak and Samuel Cohen. The paper is available on arxiv.


Huyên Pham

Title: Actor-Critic learning for mean-field control in continuous time

Abstract: We study policy gradient and actor-critic algorithm for solving mean-field control problems within a continuous time model-free a.k.a. reinforcement learning setting. The approach is based on a gradient-based representation of the value function, employing parametrized randomized policies. The learning for both the actor (policy) and critic (value function) is facilitated by a class of moment neural network functions on the Wasserstein space of probability measures, and the key feature is to sample directly trajectories of distributions. A central challenge addressed in this study pertains to the computational treatment of an operator specific to the mean-field framework. To illustrate the effectiveness of our methods, we provide a comprehensive set of numerical results. These encompass diverse examples, including multi-dimensional settings and nonlinear quadratic mean-field control problems with controlled volatility.


Claudia Strauch

Title: Learning to reflect: On data-driven approaches to stochastic optimal control

Abstract: Reinforcement Learning (RL) and stochastic control share the common goal of finding optimal strategies in uncertain environments. While RL algorithms are actively used in a wide range of domains, the formulation of theoretical guarantees is a major challenge. In contrast, stochastic control provides theoretical solutions to optimal control problems in many scenarios, but their applicability suffers from the standard assumption of known dynamics of the underlying stochastic process. To overcome this limitation, we propose purely data-driven strategies for stochastic control, which we study for ergodic impulse and singular control problems in the context of continuous diffusion processes. In particular, we describe the specific statistical challenges arising in the stochastic control set-up. The exploration vs. exploitation dilemma, well known from RL, plays an essential role in the considerations, and we present some concentration results allowing to deal with it. Finally, we show how these insights can be translated into regret convergence rates of polynomial order for the control problems considered.


Registered participants

Alexandre Popier (Le Mans Université)

Céline Labart (Université Savoie Mont Blanc / LAMA)

Idris Kharroubi (Sorbonne University/LPSM)

Marie-Amélie Morlais (Le Mans Université - Laboratoire Manceau de Mathématiques)

Antonio Ocello (CMAP, Ecole Polytechnique)

Marta Gentiloni Silveri (Ecole Polytechnique/ CMAP)

Said Hamadene (Le Mans University/LMM)

Cyril Benezet (ENSIIE, LaMME, UEVE UPS)

Lorenzo Croissant (ENSAE)

Orso Forghieri (Ecole Polytechnique)

Marylou Gabrié (École Polytechnique)

Jean Pachebat (CMAP, Ecole Polytechnique)

Ibrahim Merad (Université Paris Cité LPSM)

Simon Coste (LPSM)

Alain Oliviero-Durmus (Ecole polytechnique)

Shiva Darshan (ENPC CERMICS)

Stéphane Crépey (LPSM)

Adrien Richou (IMB, Université de Bordeaux)

Jean-François Chassagneux (LPSM)

Sylvain Delattre (Paris Cité / LPSM)

Manal Jakani (ENSAE Paris/CREST)

Zakaria Bensaid (Le Mans university / LMM-IRA)

Guillaume Chennetier (CMAP École polytechnique)

Wissal Sabbagh (Le Mans University)