Ecological Theory of RL

How Does Task Design Influence Agent Learning?

Tuesday, December 14th, 2021 @ NeurIPS 2021 (Virtual)

08:00 - 17:30 (ET)

Call for Papers

In reinforcement learning (RL), designing general-purpose algorithms that apply to arbitrary Markov Decision Processes (MDPs) is very appealing because it broadens the range of problems that we can address using this technique. However, when we utilize these methods to solve real applications, we put considerable time into carefully parameterizing the problem, such as selecting the appropriate state representations and action spaces, fine-tuning reward functions, and designing data collection strategies. RL is not alone in this regard: researchers in the supervised learning community typically assume datasets to be fixed (and iterate on the algorithms and models), while practitioners often fix the algorithm and model (and instead iterate on the dataset). Some have argued that perhaps a more data-centric view of machine learning research is needed [18, 12], and we would like to encourage the research community to investigate this same principle in the context of RL.

Data in RL may be understood to be the properties of environments and tasks, usually modelled through underlying MDPs. From this perspective, a data-centric study of RL would parallel Gibson’s ecological theory of visual perception and psychology [9]. An ecological study of RL should examine the behavior of algorithms in the context of their environment to further understand how different properties (such as linearity, ergodicity, mixing rate, among others) influence the performance of these methods. We want the community to develop a systematic approach to RL task design that complements today's algorithmic-centric view. Properties and taxonomies of environments and tasks have been previously investigated in several areas of RL research such as curriculum and continual learning [27, 5, 22], bisimulations and homomorphisms [21, 7, 4, 26], affordances [28], PAC analysis [13, 1], information-theoretic perspectives [14, 11, 16, 8], meta-analysis of RL benchmarks [17, 20, 19], among many others. However, these endeavors have been usually disconnected from the efforts made to build environments and tasks [2, 25, 3, 24, 23, 6, 15, 10 ], leaving a gap in our understanding of how algorithmic solutions and environments designs interact.

This workshop builds connections between different areas of RL centered around the understanding of algorithms and their context. We are interested in questions such as, but not limited to:

  1. How to gauge the complexity of an RL problem.

  2. Which classes of algorithms can tackle which classes of problems.

  3. How to develop practically applicable guidelines for formulating RL tasks that are tractable to solve.

We expect submissions that address these and other related questions through an ecological and data-centric view, pushing forward the limits of our comprehension of the RL problem. In particular, we encourage submissions that investigate the following areas:

I. Properties and Taxonomies of MDP, Tasks or Environments

And their connection to:

    • Curriculum, and continual, and multi-task learning

    • Novelty search, diversity algorithms, and open-endedness.

    • Representation learning.

    • MDPs homomorphism, bisimulation, inductive biases, equivalences and affordances.

    • PAC analysis of MDPs.

    • Dynamical systems and control theory.

    • Information-theoretic perspectives on MDPs.

    • Reinforcement Learning benchmarks and their meta-analyses.

    • Real-world applications of RL (Robotics, Recommendation, etc.)

II. Properties of Agents' Experiences

And their connection to:

    • Offline Reinforcement Learning.

    • Exploration.

    • Curiosity and intrinsic motivation.

    • Skills discovery and hierarchical reinforcement learning.

    • Unsupervised objectives for reinforcement learning.

IMPORTANT DATES

  • Submissions Open: Aug 1, 2021, 00:00 AOE

  • Submissions Deadline: October 8th, 2021 23:59 AOE

  • Authors Notification: Oct 22, 2021

  • Camera Ready: Dec 1, 2021

  • Workshop: December 14th, 2021 8:00 ET (@NeurIPS 2021)

Organizers

Manfred Diaz

University of Montreal

Hiroki Furuta

University of Tokio

Elise van der Pol

University of Amsterdam

Lisa Lee

Carnegie Mellon University

Shixiang Shane Gu

Google Brain

Simon S. Du

University of Washington

Marc G. Bellemare

Google Brain

Sergey Levine

UC Berkeley

Program Committee

  • Maruan Al-Shedivat

  • Wesley Chung

  • Kamyar Ghassemipour

  • Tadashi Kozuno

  • Kei Ota

  • Avi Singh

  • Masatoshi Uehara

  • Annie Xie

  • Zeyu Zheng

  • Florian Golemo

  • Tatsuya Matsushima

  • Charlie Gauthier

  • Miguel Suau

  • Qi Wang

  • Tianhe Yu

  • Archit Sharma

  • Viraj Mehta

  • Jakob Buckman

  • Scott Fujimoto

  • Khimya Khetarpal

  • Suraj Nair

  • Harshit Sikchi

  • Ahmed Touati

  • Yufei Wang

  • David Yu-Tung Hui

  • Siddharth Ancha

References

  1. Mohammad Gheshlaghi Azar, Ian Osband, and Rémi Munos. Minimax regret bounds for reinforcement learning. In International Conference on Machine Learning, 2017.
  2. Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.
  3. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
  4. P S Castro. Scalable methods for computing state similarity in deterministic Markov Decision Processes. Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 2020.
  5. John D. Co-Reyes, Suvansh Sanjeev, Glen Berseth, Abhishek Gupta, and Sergey Levine. Ecological Reinforcement Learning. arXiv preprint arXiv:2006.12478, 2020.
  6. Karl Cobbe, Christopher Hesse, Jacob Hilton, and John Schulman. Leveraging procedural generation to benchmark reinforcement learning. arXiv preprint arXiv:1912.01588, 2020.
  7. J Desharnais, A Edalat, and P Panangaden. Bisimulation for labelled Markov processes. Information and Computation, 2002.
  8. Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, and Shixiang Shane Gu. Policy Information Capacity: Information-theoretic Measure for Task complexity in Deep Reinforcement Learning. In International Conference on Machine Learning, 2021.
  9. James J Gibson. The Ecological Approach to Visual Perception: Classic Edition. Psychology Press, 2014.
  10. William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, and Phillip Wang. The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors. arXiv preprint arXiv:1904.10079, 2019.
  11. Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, and Nicolas Heess. Action and Perception as Divergence Minimization, 2020.
  12. Andrej Karpathy and Pieter Abbeel. The robot brains podcast: Andrej Karpathy on the visionary ai in tesla’s autonomous driving.
  13. Michael Kearns and Satinder Singh. Near-optimal Reinforcement Learning in Polynomial Time. Machine Learning, 2002.
  14. A.S. Klyubin, D. Polani, and C.L. Nehaniv. Empowerment: A Universal Agent-centric Measure of Control. In IEEE Congress on Evolutionary Computation, 2005.
  15. Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, and Tim Rocktäschel. The Nethack Learning Environment. arXiv preprint arXiv:2006.13760, 2020.
  16. Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, and Zheng Wen. Reinforcement Learning, Bit by Bit, 2021.
  17. Marlos C Machado, Marc G Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. The journal of artificial intelligence research, 61:523–562, March 2018.
  18. Andrew Ng. MLOps: From model-centric to data-centric AI. https://www.youtube.com/watch?v=06-AZXmwHjo, 2021.
  19. Declan Oller, Tobias Glasmachers, and Giuseppe Cuccu. Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing. In International Conference on Autonomous Agents and Multi-Agent Systems, 2020.
  20. Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, and Hado Van Hasselt. Behaviour Suite for Reinforcement Learning. In International Conference on Learning Representations, 2020.
  21. Balaraman Ravindran and Andrew G Barto. Model Minimization in Hierarchical Reinforcement Learning. In Abstraction, Reformulation, and Approximation, pages 196–211. Springer Berlin Heidelberg, 2002.
  22. Daniele Reda, Tianxin Tao, and Michiel van de Panne. Learning to Locomote: Understanding how Environment Design Matters for Deep Reinforcement Learning. In Proc. ACM SIGGRAPH Conference on Motion, Interaction and Games, 2020.
  23. Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A Platform for Embodied AI Research. In International Conference on Computer Vision, 2019.
  24. Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
  25. Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics Engine for Model-based Control. In International Conference on Intelligent Robots and Systems, 2012.
  26. Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, and Max Welling. MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning. In Advances in Neural Information Processing Systems, 2020.
  27. Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, and Kenneth O. Stanley. Enhanced POET: Open-ended Reinforcement Learning Through Unbounded Invention of Learning Challenges and their Solutions. arXiv preprint arXiv:2003.08536, 2020.
  28. Khimya Khetarpal, Zafarali Ahmed, Gheorghe Comanici, David Abel, Doina Precup. What can I do here? A Theory of Affordances in Reinforcement Learning. In International Conference on Machine Learning (ICML) 2020.