Keywords: reinforcement learning, fairness, privacy, machine learning, statistics, decision theory, optimisation, control, game theory, bandit problems, sample complexity, Monte Carlo estimation, ensemble methods, neuroscience, security
My research interests lie mostly in the area of decision theory and machine learning. In general, the possibility of finding approximate solutions to computationally intractable problems is interesting. In particular, I am interested in methods for making approximately optimal sequential decisions and in particular reinforcement learning. The difficulty in making such decisions arises from various forms of uncertainty, all of which can be expressed as uncertainty in our current model. This may be due to a lack of sufficient observations (implying either partial observability or simply an insufficient number of experiments) or prior knowledge. Such problems are expressed in three ways in reinforcement learning: firstly, as the exploration-exploitation tradeoff, which arises even in the simplest of problems, secondly in the POMDP framework, where observations about some variables are simply unavailable and where there is limited (if any) prior knowledge about the underlying model structure and finally in problems with continuous state action spaces.
Most of the relevant basic theoretical work had already been exhausted in the 1960s in the form of optimal statistical decision theory (i.e. Wald's sequential analysis theory, Bellman's dynamic programming framework and the work of DeGroot and others within the Bayesian framework for general utility maximisation problems). In the general case the methods developed are impractical to use, however, it would be interesting to examine whether results from this early theory can be applied in order to obtain good approximate solutions to the difficult problems in reinforcement learning. Recently, optimal statistical decision theory has received renewed interest from the reinforcement learning community.
Since 2012, I have also been working in the field of differential privacy. My main work in that field relates to uncertainty quantification, the interaction between Bayesian inference and privacy and the effect of privacy in learning algorithms.
Since 2016, I have been working on fairness and human-AI interaction more generally. In particular, I am interested in scenarios where the use of AI can lead to undesirable societal outcomes, and the design of mechanisms to reduce or eliminate its negative impact. My group has also looked at the problem of collaborating with a human when their preferences are unknown, or when their beliefs are not aligned with those of the AI.
Interesting applications would be: adversarial games with a difficult evaluation (Go, for example), car racing (very challenging in terms of controlling the car, with additional challenges created through the need to track opponents and to make decisions under conditions of uncertainty in real time, also interesting in terms of modelling), biological neural systems (biological modelling, mechanisms for approximate inference in biological systems, population-based methods), active learning (where collecting data is very expensive).
My PhD thesis was about developing ensemble methods for sequence learning. In particular:
boosting and bagging of hidden Markov model mixtures for speech recognition
bagging for exploration in reinforcement learning, and some extensions to posterior sampling.