Related Work
For reference, we have included below a list of some important publications on exploration in RL from the last three years. If there is other related work that you think is particularly relevant, email us at erl-leads@google.com and we'll add it to the list!
- Bellemare, Marc, et al. "Unifying count-based exploration and intrinsic motivation." Advances in Neural Information Processing Systems. 2016.
- Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295 (2017).
- Fu, Justin, John D. Co-Reyes, and Sergey Levine. "EX2: Exploration with Exemplar Models for Deep Reinforcement Learning." arXiv preprint arXiv:1703.01260 (2017).
- Houthooft, Rein, et al. "Vime: Variational information maximizing exploration." Advances in Neural Information Processing Systems. 2016.
- Lipton, Zachary, et al. "BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems." arXiv preprint arXiv:1711.05715 (2017).
- Machado, Marlos C., et al. "Eigenoption Discovery through the Deep Successor Representation." arXiv preprint arXiv:1710.11089 (2017).
- Mohamed, Shakir, and Danilo Jimenez Rezende. "Variational information maximisation for intrinsically motivated reinforcement learning." Advances in neural information processing systems. 2015.
- Osband, Ian, et al. "Deep Exploration via Bootstrapped DQN." Advances in Neural Information Processing Systems. 2016.
- Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." arXiv preprint arXiv:1705.05363 (2017).
- Pinto, Lerrel, et al. "The curious robot: Learning visual representations via physical interactions." European Conference on Computer Vision. Springer International Publishing, 2016.
- Plappert, Matthias, et al. "Parameter Space Noise for Exploration." arXiv preprint arXiv:1706.01905 (2017).
- Tang, Haoran, et al. "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning." Advances in Neural Information Processing Systems. 2017.
Theory
- Dann, Christoph, and Emma Brunskill. "Sample complexity of episodic fixed-horizon reinforcement learning." Advances in Neural Information Processing Systems. 2015.
- Jiang, Nan, et al. "Contextual Decision Processes with Low Bellman Rank are PAC-Learnable." arXiv preprint arXiv:1610.09512 (2016).
- Osband, Ian, and Benjamin Van Roy. "Why is posterior sampling better than optimism for reinforcement learning." arXiv preprint arXiv:1607.00215 (2016).