Lukasz Szpruch

December 14th

Title: Stochastic Control, Gradient Flows and Reinforcement learning

Speaker: Lukasz Szpruch (University of Edinburgh)

Date/Time: Tuesday, 12/14, 7pm CET (10am PST, 1pm EST)

Abstract: The aim of this talk is to present a policy gradient and episodic learning algorithms often used by the reinforcement learning community through the lens of continuous-time stochastic control theory.

In the first part of the talk, I will overview stochastic control problems regularised by the relative entropy, where the action space is the space of measures. This setting includes relaxed control problems, problems of finding Markovian controls with the control function replaced by an idealised infinitely wide neural network. By exploiting the Pontryagin optimality principle, we identify suitable metric space on which we construct gradient flow for the measure-valued control process along which the cost functional is guaranteed to decrease. It is shown that under appropriate conditions, this gradient flow has an invariant measure which is the optimal control for the regularised stochastic control problem.

In the second part of the talk, I'll present a linear-convex stochastic control problem with Markovian controls and unknown coefficients. I'll then define a learning algorithm inspired by filtering theory that allows achieving the optimal regret.

Bio: Dr. Szpruch is a Reader (Associate Professor) at the School of Mathematics, the University of Edinburgh, and the Programme Director for Finance and Economics at the Alan Turing Institute, the National Institute for Data Science and AI. At Turing, Dr. Szpruch is providing academic leadership for partnerships with the National Office for Statistics, Accenture and HSBC. Dr. Szpruch is the Principle Investigator of the research programme FAIR on responsible adoption of AI in the financial services industry. He is also a co-Investigator of the UK Centre for Greening Finance & Investment (CGFI).

Before joining Edinburgh, Dr. Szpruch was a Nomura Junior Research Fellow at the Institute of Mathematics, University of Oxford, and a member of the Oxford-Man Institute for Quantitative Finance.

Meeting Recording:

Access Passcode: w**^m0rz