Keywords: reinforcement learning, machine learning, statistics, decision theory, optimisation, control, game theory, bandit problems, sample complexity, Monte Carlo estimation, ensemble methods, neuroscience, security
My research interests lie mostly in the area of machine learning. In general, the possibility of finding approximate solutions to computationally intractable problems is interesting. In particular, I am interested in methods for making approximately optimal sequential decisions. The difficulty in making such decisions arises from various forms of uncertainty, all of which can be expressed as uncertainty in our current model. This may be due to a lack of sufficient observations (implying either partial observability or simply an insufficient number of experiments) or prior knowledge. Such problems are expressed in three ways in reinforcement learning: firstly, as the exploration-exploitation tradeoff, which arises even in the simplest of problems, secondly in the POMDP framework, where observations about some variables are simply unavailable and where there is limited (if any) prior knowledge about the underlying model structure and finally in problems with continuous state action spaces.
Most of the relevant basic theoretical work had already been exhausted in the 1960s in the form of optical statistical decision theory (i.e. Wald's sequential analysis theory, Bellman's dynamic programming framework and the work of DeGroot and others within the Bayesian framework for general utility maximisation problems). In the general case the methods developed are impractical to use, however, it would be interesting to examine whether results from this early theory can be applied in order to obtain good approximate solutions to the difficult problems in reinforcement learning. So far it appears that there has been but little interest in optimal statistical decision theory from the reinforcement learning community.
A potentially interesting future research subject could be performing approximate inference in Bayesian models of reinforcement learning using population of estimates, as is frequently done in Markov chain Monte Carlo methods for supervised learning tasks. Some preliminary work that I have undertaken on using such methods for optimally trading exploration and exploitation in bandit problems shows its potential usefulness.
In continuous dynamical systems, reinforcement learning has been so far largely been applied in an ad-hoc manner. It would be interesting to examine proper Bayesian methods, with suitable approximations, to this problem.
An intriguing question is how such approximate inference mechanisms could be implemented in real nervous systems, particularly with respect to uncertainty. Some recent work such as that Dayan, Doya and Pouget, among others proposes population-based models of knowledge and uncertainty for biological systems. Given the opportunity, I would be interested in examining the relation of such models to real systems.
Another domain of interest is the interface between planning and uncertainty. Decisions in adversarial games are made according to some evaluation of a future position in a game tree. However, frequently the uncertainty of the evaluations is not taken into account, apart from perhaps in interval estimation methods.
Finally, a secondary research interest of mine lies in the simulation of physical systems, particularly with respect to tribology and its application to racing car simulation.
Interesting applications would be: adversarial games with a difficult evaluation (Go, for example), racing (very challenging in terms of controlling the car, with additional challenges created through the need to track opponents and to make decisions under conditions of uncertainty in real time, also interesting in terms of modelling), biological neural systems (biological modelling, mechanisms for approximate inference in biological systems, population-based methods), active learning (where collecting data is very expensive).
Previous work was related to ensemble methods and sequence learning. I have done some work on applying ensemble methods such as boosting and bagging for creating hidden Markov model mixtures for speech recognition, some work in reinforcement learning (RL in continuous spaces, approximations to the optimal exploration-exploitation tradeoff, representations of uncertainty in RL and finally application of RL to supervised tasks) and some work on prediction using switching extended Kalman filters with some simplifications.