This is my most cited paper, as it triggered a field of research called Posterior Sampling for Reinforcement Learning. The idea is that we split RL into two parts: (i) Bayesian estimation of the environment (expressing uncertainty) (ii) To determine a decision/control policy, occasionally resample a possible world from the environment model and solve it for an optimal strategy. Follow this strategy to gather new observations and reduce uncertainty. Recent theoretical results show PSRL can provide a near optimal exploration/exploitation trade-off.
Strens M J A, 2000. A Bayesian framework for reinforcement learning, In Proceedings of the Seventeenth International Conference on Machine Learning.
Differential evolution operators can be used in a population MCMC framework for a very effective vector space sampler or optimizer:
Strens M J A, Bernhardt M, Nicholas Everett, 2002. Markov chain monte carlo sampling using direct search optimization, In Proceedings of the Nineteenth International Conference on Machine Learning.
Principled version of genetic algorithms. Can be used for discrete sampling or optimization:
Strens M J A, 2003. Evolutionary MCMC sampling and optimization in discrete spaces, In Proceedings of the Twentieth International Conference on Machine Learning ICML-2003
Learning strategies where evaluation of performance (across a large set of scenarios) is expensive:
Strens M J A, Moore A W, 2003. Policy Search using Paired Comparisons, 2003. Journal of Machine Learning Research.
The competitive attentional tracker... effective track-before-detect at low signal to noise ratios and in clutter. The paper describes a dynamic competitive system that tracks all motions in a scene at the same time ("track everything"):
Strens M J A, Gregory I N, 2003. Tracking in Cluttered Images, Journal of Image & Vision Computing.
Sampling from a function that is an exponential of a sum, but avoiding evaluating the sum at every step. Efficient for Bayesian inference from large datasets. A form of rejection sampling with highly directed proposals constructed using subsets of the data:
Strens M J A, 2004. Efficient hierarchical MCMC for policy search. In Proceedings of the twenty-first International Conference on Machine Learning.
The use of stochastic task models for dynamic replanning in multi robot task allocation:
Strens M J A, Windelinckx N, 2005. Combining Planning with Reinforcement Learning for Multi-robot Task Allocation. Springer Lecture Notes in Computer Science, Volume 3394.
Some POMDP problems have special structure, allowing efficient solution:
Strens M J A, Learning multi-agent search strategies, 2005. Springer Lecture Notes in Computer Science, Volume 3394, pp 245-259.
After nearly 20 years, people have started to request my PhD thesis, and I'm revisiting this area of research. Some of it looks quite naive (treatment of partial observability) but the section on reinforcement learning for visual attention does foresee today's hot topics of attentional processing and working memory. The vision system presented in chapters 11-14 used attentional mechanisms (both bottom-up & learnt top-down) to find and recognise objects in images. Notably it was also capable of one-shot learning of new visual categories
Strens M J A, Learning, cooperation and feedback in pattern recognition. PhD Thesis, King's College London, 1999.
There's a local copy with corrected typesetting in Uploaded files.