Emergent communication through curiosity-driven multi-agent reinforcement learning
Project funded by the French National Research Agency (ANR)
(project ECOCURL, Grant ANR-20-CE23-0006)
Emergent communication through curiosity-driven multi-agent reinforcement learning
Project funded by the French National Research Agency (ANR)
(project ECOCURL, Grant ANR-20-CE23-0006)
What are the conditions for complex communication systems to emerge in populations of artificial agents? How can emergent communication systems in turn support the acquisition of an open-ended repertoire of cooperative skills? These questions are currently gaining considerable interest in the Artificial Intelligence (AI) community due to recent advances in machine learning (Bisk et al., 2020; Foerster et al., 2016; Gauthier & Mordatch, 2016; Jaques et al., 2019; Mordatch & Abbeel, 2017). Deep Reinforcement Learning (DRL) algorithms (Mnih et al., 2015) have highly improved the abilities of artificial agents to acquire complex behavior by learning through experience an action policy, i.e. a function associating their observations of the environment (possibly in high dimensions, e.g. visual observations) to the actions to be executed for maximizing a long-term reward. In a multi-agent setting, i.e. when multiple agents interact in a shared environment and co-acquire action policies for maximizing their own reward functions, the coupled learning processes can converge to complex cooperative or competitive strategies (Multi-Agent Reinforcement Learning, MARL, see (Hernandez-Leal et al., 2019) for a recent survey). Among these strategies, communication systems can spontaneously emerge at the population level as a way to optimize the realization of complex tasks.
The ECOCURL project (Emergent Communication through Curiosity-driven Multi-Agent Reinforcement Learning) addresses key issues of the Artificial Intelligence scientific evaluation committee related to reinforcement learning, multi-agent systems and representation learning through the following objectives.
• O1: To design and implement a novel MARL algorithm combining intrinsically-motivated learning with compositional goal imagination through factored representations.
• O2: To evaluate the role of the structure of the environment and of the agents’ cognitive architecture in the emergence of compositional communication.
• O3: To evaluate how compositional communication can in turn support the open-ended discovery of increasingly complex cooperative strategies in a mixed cooperative-competitive scenario.
• O4: To leverage the achievement of the above objectives to build an integrated demonstrator in a rich 3D environment showing how agent populations can co-acquire an open-ended repertoire of cooperative and communicative strategies, as well as to use this demonstrator for disseminating the project results in high-visibility outreach events.
Project coordinator: Clément Moulin-Frier, Flowers team at Inria (France)
During the first 18 months of the projects, we have made the following contributions:
Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS (Nisioti et al., 2022; recently submitted to an AI conference). This contribution is making significant progress towards O1 (T1.1 and T1.3), by proposing a novel MARL algorithm promoting collective innovation in RL agent populations through sequential experience sharing. It is also addressing parts of O2 (T2.1 and T2.3) by comparing the role of experience sharing in diverse environments, defined as different hierarchical innovation tasks. Finally, it is contributing to O3 by proposing measures of open-ended innovation in those environments.
Socially Supervised Representation Learning: the Role of Subjectivity in Learning Efficient Representations. (Taylor et al., 2022, published at AAMAS 2022). This contribution is contributing to O2 by proposing that multi-agent environments, where agents do not have access to the observations of others but can communicate within a limited range, guarantees a common context that can be leveraged in individual representation learning. It is contributing to O1 (T1.1) by proposing a cognitive architecture comprised of a population of autoencoders and to O2 (T2.2) by defining multiple loss functions capturing different aspects of effective communication, and examining their effect on the learned representations.
Plasticity and evolvability under environmental variability: the joint role of fitness-based selection and niche-limited competition. (Nisioti & Moulin-Frier, 2022), paper published at GECCO 2022). This contribution is contributing to O2 by studying the interplay between environmental dynamics and adaptation in a model of the evolution of plasticity and evolvability. We experiment with different types of environments characterized by the presence of niches and a climate function that determines the fitness landscape. We empirically show that environmental dynamics affect plasticity and evolvability differently and that the presence of diverse ecological niches favors adaptability even in stable environments.
We have also published several opinion and position papers explaining the positioning of the project wrt to the literature (Moulin-Frier & Oudeyer, 2021; Nisioti et al., 2021; Ten et al., 2022).
We have coorganized the second and third SMILES workshop https://sites.google.com/view/smiles-workshop/
2021. Recruitment of Mateo Mahaut as a Master intern funded by the ECOCURL project. He is second author of (Nisioti et al., 2022). He has now started a PhD in the lab of Marco Baroni at The Universitat Pompeu Fabra in Barcelona, Spain.
2022. Recruitment of Elias Masquill as a Master intern funded by the ECOCURL project. He is contributing to the development of the integrated MARL algorithm to be delivered in T1.1.
February 2022. Recruitment of Gautier Hamon as a PhD student funded by the ECOCURL project. He is currently working on T1.1 and all tasks of WP2.
2022. Publication of two conference papers (Nisioti & Moulin-Frier, 2022; Taylor et al., 2022) in major AI and Evolutionary Computation conferences (AAMAS and GECCO).
2022. Publication of a jounal paper (Demirel et al., 2021) as a collaboration with the Universitat Pompeu Fabra in Barcelona, Spain.
2021-2022. Collaboration with Ida Momennejad from Microsoft Research in New York. With Clément Moulin-Frier (PI of the project), she is co-last author of (Nisioti et al., 2022).
Co-organization of the 2nd and 3rd editions of the SMILES workshop at the International Conference on Development and Learning, ICDL (virtual conference). https://sites.google.com/view/smiles-workshop/
Clément Moulin-Frier gave several invited talks where he presented the objectives, methodology and preliminary results of the ECOCURL project. This includes:
December 2021
“Open-ended skill acquisition in humans and machines: An evolutionary and developmental perspective”
Brains@Bay meetup (organized by Numenta, USA)
Invitation from Numenta
September 2021
“The role of self-organization mechanisms in the emergence of behavioral regularity and diversity”
Preprogrammed: Innateness in Neuroscience and AI (Symposium organized by the Nencki Institute, Poland)
Invitation from Mateusz Kostecki
April 2021
“Beyond the utilitarian approach to emergent communication: The role of self-organization and compositional imagination in language evolution and cultural innovation”
Deepmind reading group
Invitation from Florian Strub and Julien Pérolat (Deepmind, Paris)
July 2022. Clément Moulin-Frier was interviewed for the Inria Podcast “Desassemblons le Numérique”. https://www.inria.fr/fr/desassemblons-le-numerique-episode3
Demirel, B., Moulin-Frier, C., Arsiwalla, X. D., Verschure, P. F. M. J., & Sánchez-Fibla, M. (2021). Distinguishing Self, Other, and Autonomy From Visual Feedback: A Combined Correlation and Acceleration Transfer Analysis. Frontiers in Human Neuroscience, 15. https://www.frontiersin.org/articles/10.3389/fnhum.2021.560657
Moulin-Frier, C., & Oudeyer, P.-Y. (2021). Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges. Challenges and Opportunities for Multi-Agent Reinforcement Learning (COMARL), AAAI Spring Symposium Series, Stanford University, Palo Alto, California, USA. http://arxiv.org/abs/2002.08878
Nisioti, E., Jodogne-del Litto, K., & Moulin-Frier, C. (2021). Grounding an Ecological Theory of Artificial Intelligence in Human Evolution. NeurIPS 2021 - Conference on Neural Information Processing Systems / Workshop: Ecological Theory of Reinforcement Learning. https://hal.archives-ouvertes.fr/hal-03446961
Nisioti, E., Mahaut, M., Oudeyer, P.-Y., Momennejad, I., & Moulin-Frier, C. (2022). Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS (arXiv:2206.05060). arXiv. https://doi.org/10.48550/arXiv.2206.05060
Nisioti, E., & Moulin-Frier, C. (2022). Plasticity and evolvability under environmental variability: The joint role of fitness-based selection and niche-limited competition. Proceedings of the 2022 Genetic and Evolutionary Computation Conference (GECCO 2022). http://arxiv.org/abs/2202.08834
Taylor, J., Nisioti, E., & Moulin-Frier, C. (2022). Socially Supervised Representation Learning: The Role of Subjectivity in Learning Efficient Representations. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2022). http://arxiv.org/abs/2109.09390
Ten, A., Oudeyer, P.-Y., & Moulin-Frier, C. (2022). Curiosity-driven exploration: Diversity of mechanisms and functions. In The Drive for Knowledge: The Science of Human Information Seeking. Cambridge University Press. https://hal.inria.fr/hal-03447896