This is my most cited paper, as it triggered a field of research called "Posterior Sampling for Reinforcement Learning". The idea is that we split RL into two parts: (i) Bayesian estimation of the environment (expressing uncertainty) (ii) To determine a decision/control policy, occasionally resample a possible world from the environment model and solve it for an optimal strategy. Follow this strategy to gather new observations and reduce uncertainty. Recent theoretical results show PSRL can provide a near optimal exploration/exploitation trade-off. Strens M J A, 2000. A Bayesian framework for reinforcement learning, In Proceedings of the Seventeenth International Conference on Machine Learning. Differential evolution operators can be used in a population MCMC framework for a very effective vector space sampler or optimizer:Strens M J A, Bernhardt M, Nicholas Everett, 2002. Markov chain monte carlo sampling using direct search optimization, In Proceedings of the Nineteenth International Conference on Machine Learning. Principled version of genetic algorithms. Can be used for discrete sampling or optimization:Learning strategies where evaluation of performance (across a large set of scenarios) is expensive:Strens M J A, Moore A W, 2003. Policy Search using Paired Comparisons, 2003. Journal of Machine Learning Research. The competitive attentional tracker... effective track-before-detect at low signal to noise ratios and in clutter:Strens M J A, Gregory I N, 2003. Tracking in Cluttered Images, Journal of Image & Vision Computing. Sampling from a function that is an exponential of a sum, but avoiding evaluating the sum at every step. Efficient for Bayesian inference from large datasets. A form of rejection sampling with highly directed proposals constructed using subsets of the data:Strens M J A, 2004. Efficient hierarchical MCMC for policy search. In Proceedings of the twenty-first International Conference on Machine Learning. http://www.machinelearning.org/proceedings/icml2004/papers/177.pdf The use of stochastic task models for dynamic replanning in multi robot task allocation:Strens M J A, Windelinckx N, 2005. Combining Planning with Reinforcement Learning for Multi-robot Task Allocation. Springer Lecture Notes in Computer Science, Volume 3394. Some POMDP problems have special structure, allowing efficient solution:Strens M J A, Learning multi-agent search strategies, 2005. Springer Lecture Notes in Computer Science, Volume 3394, pp 245-259. |