Related Work

For reference, we have included below a list of some important publications on exploration in RL from the last three years. If there is other related work that you think is particularly relevant, email us at erl-leads@google.com and we'll add it to the list!

Bellemare, Marc, et al. "Unifying count-based exploration and intrinsic motivation." Advances in Neural Information Processing Systems. 2016.
Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295 (2017).
Fu, Justin, John D. Co-Reyes, and Sergey Levine. "EX2: Exploration with Exemplar Models for Deep Reinforcement Learning." arXiv preprint arXiv:1703.01260 (2017).
Houthooft, Rein, et al. "Vime: Variational information maximizing exploration." Advances in Neural Information Processing Systems. 2016.
Lipton, Zachary, et al. "BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems." arXiv preprint arXiv:1711.05715 (2017).
Machado, Marlos C., et al. "Eigenoption Discovery through the Deep Successor Representation." arXiv preprint arXiv:1710.11089 (2017).
Mohamed, Shakir, and Danilo Jimenez Rezende. "Variational information maximisation for intrinsically motivated reinforcement learning." Advances in neural information processing systems. 2015.
Osband, Ian, et al. "Deep Exploration via Bootstrapped DQN." Advances in Neural Information Processing Systems. 2016.
Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." arXiv preprint arXiv:1705.05363 (2017).
Pinto, Lerrel, et al. "The curious robot: Learning visual representations via physical interactions." European Conference on Computer Vision. Springer International Publishing, 2016.
Plappert, Matthias, et al. "Parameter Space Noise for Exploration." arXiv preprint arXiv:1706.01905 (2017).
Tang, Haoran, et al. "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning." Advances in Neural Information Processing Systems. 2017.

Theory

Dann, Christoph, and Emma Brunskill. "Sample complexity of episodic fixed-horizon reinforcement learning." Advances in Neural Information Processing Systems. 2015.
Jiang, Nan, et al. "Contextual Decision Processes with Low Bellman Rank are PAC-Learnable." arXiv preprint arXiv:1610.09512 (2016).
Osband, Ian, and Benjamin Van Roy. "Why is posterior sampling better than optimism for reinforcement learning." arXiv preprint arXiv:1607.00215 (2016).

Google Sites

Report abuse