Mark Rowland

Research Scientist | Google DeepMind

I'm a research scientist at Google DeepMind, working on algorithms for machine learning. I typically work in a sub-field known as reinforcement learning, in which agents are given positive and negative feedback on their performance, and aim to improve over time.  I enjoy working on a wide range of problems, from mathematical convergence theory through to large-scale applications of deep reinforcement learning.

Selected Recent Publications and Preprints

An Analysis of Quantile Temporal-Difference Learning
Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
Establishes convergence of quantile temporal-difference learning with probability 1, through stochastic approximation of differential inclusions and connections to quantile dynamic programming.
JMLR 2024 | arXiv version

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney
How many samples do you need to accurately estimate return distribution of a given policy? This paper shows that, roughly speaking, no more are required than for estimating just the value function of the policy.
arXiv preprint

Distributional Bellman Operators over Mean Embeddings
Li Kevin Wenliang, Grégoire Déletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
Proposes a variety of algorithms for dynamic programming and TD learning in distributional reinforcement learning based on mean embeddings of return distributions.
ICML 2024 | arXiv version

A Distributional Analogue to the Successor Representation
Harley Wiltzer*, Jesse Farebrother*, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G Bellemare, Mark Rowland
Introduces reinforcement learning algorithms for performing zero-shot distributional evaluation and risk-sensitive policy selection.
ICML 2024 (Spotlight) | arXiv version

Nash Learning from Human Feedback
Rémi Munos*, Michal Valko*, Daniele Calandriello*, Mohammad Gheshlaghi Azar*, Mark Rowland*, Daniel Guo*, Yunhao Tang*, Matthieu Geist*, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot *
Proposes Nash learning from human feedback, an approach to fine-tuning large language models based on approximating Nash equilibria in regularised preference games.
ICML 2024 (Spotlight) | arXiv version

A General Theoretical Paradigm to Understand Learning from Human Preferences
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos
Introduces IPO,  a  method for fine-tuning policies directly from preference data.
AISTATS 2024 | arXiv version

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney
In tabular policy evaluation in stochastic environments, estimating value functions via quantiles can outperform the classical approach of TD learning due to a bias-variance trade-off.
ICML 2023 | arXiv version

Distributional Reinforcement Learning
Marc G. Bellemare, Will Dabney,  Mark Rowland
Textbook on distributional reinforcement learning.
Book website | MIT Press webpage (including open access copy)

See my Google Scholar page for full details of publications and preprints.

firstnamelastname [at] google [dot] com