Emergent communication through curiosity-driven multi-agent reinforcement learning
Project funded by the French National Research Agency (ANR)
(project ECOCURL, Grant ANR-20-CE23-0006)
PI: Clément Moulin-Frier (Flowers team, Inria)
Emergent communication through curiosity-driven multi-agent reinforcement learning
Project funded by the French National Research Agency (ANR)
(project ECOCURL, Grant ANR-20-CE23-0006)
PI: Clément Moulin-Frier (Flowers team, Inria)
The ECOCURL project addressed two central research questions: (1) What are the conditions for complex communication systems to emerge in populations of artificial agents? (2) How can emergent communication systems in turn support the acquisition of an open-ended repertoire of cooperative skills?
The project sits at the intersection of several active research areas in Artificial Intelligence: deep reinforcement learning, intrinsic motivation, language grounding, and multi-agent systems. At the time of the project's proposal, Deep Reinforcement Learning (DRL) had become a powerful framework for training agents to perform complex tasks in rich environments. Multi-Agent Reinforcement Learning (MARL) extended this paradigm to populations of interacting agents co-acquiring action policies in a shared environment with cooperative or competitive couplings between their reward functions. A central observation in this setting is that communication systems can spontaneously emerge at the population level as a by-product of optimizing complex cooperative tasks.
This Emergent Communication line conceives language as the emergent result of a collective behavior optimization process grounded in the functional substrates of agents' interactions, in contrast with Natural Language Processing approaches that capture structural properties of human language from static text corpora. However, MARL-based emergent communication had remained confined to simple environments, discrete action policies and single predefined cooperative tasks, and had not leveraged the intrinsically-motivated, curiosity-driven learning advances developed in single-agent DRL — which enable the autonomous discovery and learning of multiple tasks in parallel with continuous action spaces. A separate line of MARL work had concurrently shown that mixed cooperative-competitive dynamics can drive autocurricula of increasingly complex collective behaviors, but had remained largely disconnected from emergent-communication research.
Several building blocks relevant to bridging these gaps had been developed, notably within the Flowers team at Inria. The CURIOUS algorithm demonstrated how intrinsically-motivated, multi-goal reinforcement learning could enable an individual agent to autonomously discover and master a curriculum of goals of increasing complexity. The IMAGINE architecture showed how agents could leverage language-based goal imagination, freely exploring their environment and turning natural language descriptions into targetable goals through compositional generalization.
The core challenge addressed by ECOCURL was to bring these ingredients together in a multi-agent setting: studying how compositional communication systems can emerge in artificial agent populations and support the open-ended discovery of increasingly complex cooperative strategies in mixed cooperative-competitive scenarios, eventually demonstrated in a rich simulation environment.
Several scientific developments during the project period broadened the state of the art and the positioning of the project. The PI's Habilitation thesis (HDR, Moulin-Frier, 2022) enlarged the conceptual framing of the research by formalizing the interactions between environmental, adaptive, multi-agent and cultural dynamics as drivers of open-ended skill acquisition. An additional research question complementing those of ECOCURL emerged: What are the ecological conditions favoring the evolution of curiosity-driven learning? This question connected the project's initial scope to a growing body of work at the intersection of Artificial Intelligence and Artificial Life studying the role of environmental complexity in the emergence of diverse, generalist and adaptive behaviors. Relatedly, meta-reinforcement learning gained prominence as a way to train agents on distributions of environments, enabling the acquisition of general behavioral strategies with strong generalization abilities. This paradigm can be viewed as a computational model of the interplay between evolution (shaping the learning architecture across environments) and development (adapting behavior within a given environment), but had not yet been applied to multi-agent settings. The two communities also converged technologically, with GPU-accelerated numerical simulation libraries (such as JAX) enabling massively multi-agent simulations at unprecedented scales. A collaboration with Universitat Pompeu Fabra and CSIC (Barcelona, Spain) was initiated to study the role of ecological conditions in multi-agent reinforcement learning, connecting ECOCURL's questions to theories of collective eco-engineering and niche construction in socio-ecological systems. In parallel, cultural evolution became an increasingly central topic in both AI and Artificial Life — in AI as a driver of open-ended skill acquisition, in Artificial Life as an evolutionary process — providing new theoretical angles on ECOCURL's core challenge of studying how compositional communication systems can emerge in agent populations and support the open-ended discovery of increasingly complex cooperative strategies. This motivated a collaboration with Maxime Derex (IAST, Toulouse), a specialist of human cultural evolution. Finally, Large Language Models (LLMs) transformed the AI landscape and offered a new kind of inherently compositional communicating agent, opening avenues for studying cultural transmission and communication dynamics in agent populations.
In parallel, AI for Science became an important research topic during the project period. The curiosity-driven exploration algorithms central to ECOCURL — originally developed for guiding autonomous agents towards the discovery of diverse skills — proved to be relevant tools for automated scientific discovery in complex dynamical systems. Recent advances in continuous cellular automata, in particular Lenia, created rich simulation environments where such algorithms could be benchmarked to discover self-organized structures exhibiting life-like properties. This led to a productive research direction that, while going beyond the project's original multi-agent scope, demonstrated the broader applicability of the methods developed in the project. It also opened a promising perspective to apply these automated discovery algorithms back to the efficient exploration of the space of behaviors of multi-agent systems, closing the loop with the project's original scope.
The ECOCURL project was grounded in three research hypotheses. First (H1), that intrinsically-motivated learning can encourage emergent communication in cooperative multi-agent environments by guiding agents towards the autonomous discovery of a diverse set of skills. Second (H2), that the structure of an emergent communication system, in particular its compositional nature, is shaped by constraints on the structure of the environment and the agents' cognitive architectures. Third (H3), that compositional communication systems can support the acquisition of increasingly complex cooperative skills, paving the way towards open-ended cultural evolution in artificial agent populations.
These hypotheses were translated into four objectives, each addressed by a dedicated work package. Objective O1 aimed to design and implement a novel MARL algorithm combining intrinsically-motivated learning with compositional goal imagination through factored representations (WP1). Objective O2 aimed to evaluate the role of the structure of the environment and of the agents' cognitive architecture in the emergence of compositional communication (WP2). Objective O3 aimed to evaluate how compositional communication can support the open-ended discovery of increasingly complex cooperative strategies in a mixed cooperative-competitive scenario (WP3). Finally, Objective O4 aimed to leverage the above results to build an integrated demonstrator in a rich simulation environment showing how agent populations can acquire an open-ended repertoire of cooperative and communicative strategies, and to disseminate these results through high-visibility outreach events (WP3–WP4).
The evolution of the scientific context described above led to additional objectives that enriched the original ones. The broadened conceptual framework developed in the PI's HDR and the increasing interactions between AI and Artificial Life communities motivated the study of eco-evolutionary feedback dynamics in multi-agent populations: how agents collectively modify their environment, which in turn shapes the selective pressures driving their adaptation. This translated into objectives related to eco-engineering strategies in multi-agent reinforcement learning, the emergence of social behaviors through niche construction, and the evolution of adaptation mechanisms (plasticity, evolvability) under environmental variability. Relatedly, the rise of meta-reinforcement learning as a model of the interplay between evolution and development motivated the objective of integrating it with multi-agent reinforcement learning, to study how agent populations can acquire general cooperative exploration strategies across diverse environmental and social contexts. The growing importance of cultural evolution across both communities motivated objectives related to the modeling of cultural transmission dynamics and their interaction with intrinsic motivation. The rise of LLMs created the objective of using them as a new experimental paradigm for studying cultural evolution and compositional communication in agent populations. Finally, the convergence of methods towards GPU-accelerated massively multi-agent simulations motivated the development of Vivarium, a 2D multi-agent simulation platform, replacing the originally planned 3D Minecraft-like demonstrator (O4) — a choice driven by the insight that scaling the number of agents was more scientifically productive than adding a spatial dimension.
The emergence of AI for Science as a research topic, combined with the maturation of continuous cellular automata as simulation environments, created the objective of applying the curiosity-driven exploration algorithms developed in the project to the automated discovery of diverse self-organized structures and dynamics in complex systems. While this objective goes beyond the project's original multi-agent focus, it directly relies on its core algorithmic contributions.
The project was structured in four work packages following an incremental methodology, where principles were first evaluated in 2D grid-worlds, then in 2D continuous environments with realistic physics, and finally demonstrated in a rich 3D open-ended environment.
WP1 (Cognitive architecture for individual and social skills) aimed at developing the core learning architecture. Task T1.1 extended the CURIOUS algorithm to multi-agent settings, operating in both a centralized-training-decentralized-execution mode and a fully decentralized mode. Task T1.2 integrated factored representations with relational inductive biases to support compositional goal imagination. Task T1.3 added sequential communication capabilities, extending the language-based goal space from IMAGINE to the emergent communication setting with bidirectional mappings between communication signals and modular goal spaces.
WP2 (Role of environmental and cognitive constraints) focused on large-scale simulation experiments evaluating how the cognitive components from WP1 and environmental structure shape emergent communication. This included the procedural generation of cooperative tasks of varying complexity (T2.1), ablation studies comparing different versions of the cognitive architecture across training modes, knowledge representations and sequence processing models (T2.2), and an evaluation of how the structure of procedurally generated tasks influences the emergence of cooperative and communicative strategies (T2.3).
WP3 (Open-ended discovery of cooperative and communicative strategies) aimed at demonstrating how an acquired compositional communication system can support the open-ended discovery of cooperative skills. Task T3.1 proposed a mixed cooperative-competitive scenario designed to foster open-ended cooperation through a hierarchy of activities of increasing complexity, bootstrapped by competition between agent populations. Task T3.2 evaluated the resulting open-ended dynamics at the population level. Task T3.3 scaled the scenario to a rich simulation environment using Minecraft-like simulations.
WP4 handled project and software management, dissemination, and outreach, including the organization of a workshop on Emergent Communication.
The original methodology was based on multi-agent reinforcement learning with intrinsic motivation, compositional goal imagination and factored representations. During the project, the methodological toolkit was enriched in several directions. Meta-reinforcement learning, originally developed for single-agent settings, was integrated with multi-agent reinforcement learning to study the emergence of collective exploration strategies over procedurally generated task distributions, where decentralized agents learn general cooperative policies that generalize to unseen tasks and deeper compositional structures (Bornemann et al., 2023). Non-episodic neuroevolution in large multi-agent environments, implemented in JAX for GPU acceleration, provided a complementary approach to reinforcement learning for studying eco-evolutionary dynamics (Hamon et al., 2023; Taylor-Davies et al., 2025). LLM-based agent populations were introduced as a new experimental paradigm for studying cultural evolution and communication dynamics, enabling the manipulation of variables important in cultural evolution such as network structure and transmission biases (Perez et al., 2024; Perez et al., 2025a). The core simulation infrastructure evolved from the originally planned 3D Minecraft-like environment (with few agents) towards Vivarium, a 2D massively multi-agent simulation platform designed for scalability and flexibility.
Methodologically, the application of curiosity-driven algorithms to automated discovery involved adapting Intrinsically Motivated Goal Exploration Processes (IMGEP) and diversity search algorithms to the exploration of continuous cellular automata such as Lenia and Flow-Lenia (Hamon et al., 2025; Plantec et al., 2023; Khajehabdollahi et al., 2025; Michel et al., 2025). These were combined with autotelic reinforcement learning for goal-directed control of self-organized patterns (Cvjetko et al., 2025), and with interactive exploration tools integrating human guidance for constrained diversity search (Morel et al., 2025). Vision-language models were also leveraged to generate semantic goals driving exploration towards uncharted regions of behavioral spaces (Khajehabdollahi et al., 2025). These methods were further applied to gene regulatory networks in collaboration with Michael Levin (Tufts University).
The core results of the project cluster along three lines, one per research hypothesis and its associated objective (H1/O1, H2/O2, H3/O3), each also covering related findings where relevant.
H1 predicted that intrinsically-motivated learning can encourage emergent communication in cooperative multi-agent environments by guiding agents towards the autonomous discovery of a diverse set of skills. Objective O1 translated this into the design of a novel MARL algorithm combining intrinsic motivation with compositional goal imagination.
In addressing H1, the project uncovered a structural complementarity between curiosity and communication. When artificial agents are each driven by their own curiosity, they independently select different goals — a powerful driver of individual skill diversity, but also an obstacle to cooperation, since two agents pursuing different goals cannot coordinate. Nisioti et al., 2023a shows that this tension is naturally resolved by the emergence of communication: in the Goal-coordination game — a fully decentralized emergent-communication algorithm — a shared signaling protocol arises as a by-product of individual-reward maximization, allowing agents to align their goals and matching the performance of a centralized training baseline. The Goal-coordination game implements T1.1 (intrinsically-motivated MARL) on a compositional goal space — partially realizing the compositional goal imagination scoped in T1.2, with further development in Khajehabdollahi et al., 2025 and Colas et al., 2022 (further discussed in later sections). Sequential communication grounded in modular goal spaces (T1.3) was pursued in Barde et al., 2022 and Nisioti et al., 2024.
We also studied the converse direction in Bornemann et al., 2023: when compositional environments are paired with meta-RL, decentralized agents acquire general cooperative exploration strategies without being explicitly incentivized to do so, generalizing to novel objects, unseen coordination requirements and task trees of twice the training depth. A complementary finding reported in a preliminary study (section 4.4 of Hamon, 2025, G. Hamon's PhD) shows that a compositional environment can bootstrap proto-intrinsic motivation in meta-RL agents without an explicit intrinsic reward.
Taken together, these observations point to a bidirectional interaction between intrinsic motivation and compositional environments: intrinsic motivation drives skill diversity, and compositional environments conversely drive agents to develop intrinsically motivated exploration strategies. Colas et al., 2022 (cited above) extends this to the socio-cultural scale, where rich social worlds shape autotelic agents through language and interactions internalised as cognitive tools. The HDR thesis Moulin-Frier, 2022 formalises a broader framework in which autotelic capacities are bootstrapped by environmental, adaptive, multi-agent and cultural dynamics.
H2 predicted that the compositional structure of emergent communication is shaped by constraints on both environment and cognitive architecture. Objective O2 translated this into a large-scale simulation programme delivered in WP2 through targeted ablations (T2.1 procedural generation of cooperative tasks, T2.2 cognitive-component ablations, T2.3 environmental-constraint evaluation).
The procedural-task-generation infrastructure of T2.1 was realized in Bornemann et al., 2023 (collective open-ended exploration from decentralized meta-RL, NeurIPS ALOE) through a task space dynamically combining subtasks sampled from five types into a vast distribution of cooperative task trees.
Two studies instantiate the compositional-communication side directly. Barde et al., 2022 (Architect-Builder Problem, ICLR) defines an asymmetric multi-agent setting in which an architect knows the goal but cannot act, and a builder can act without knowing the task. The resulting Architect-Builder Iterated Guiding algorithm gives rise to a low-level, high-frequency communication protocol whose meaning is negotiated during learning and which generalizes to unseen tasks. Karch et al., 2022 (graphical language via the Graphical Referential Game) extends the framing to a continuous sensory-motor channel: agents produce drawings via dynamical motor primitives, and a multimodal contrastive mechanism yields a shared graphical language with compositional properties. Together they probe the cognitive-constraint axis of T2.2 at the level of the communication medium itself.
By working on environment-induced constraints we extended the scope beyond communication alone. Sánchez-Fibla et al., 2024 (cooperative control of environmental extremes, J. R. Soc. Interface) demonstrates how an ecological stressor — fire propagating on a spatial landscape — drives agent groups towards two evolutionary innovations that suppress large fires while sustaining biomass, addressing T2.3 on the role of environmental constraints on emergent cooperative strategies. Taylor et al., 2022 (socially supervised representation learning, AAMAS) complements this with a cognitive-constraint ablation at the representation level, showing that the form of partial observability envisioned in T2.3 — each agent perceiving a subjective view of a shared state — yields abstract representations outperforming single-agent baselines. Working on the environmental-constraint axis further led to a full eco-evolutionary programme, reported under Results from project evolutions below.
H3 predicted that compositional communication systems can support the open-ended acquisition of increasingly complex cooperative skills. Our contributions address three intersecting compositional structures: compositional communication, compositional environments (where existing items combine to yield new items), and compositional representations.
On the environment side, Bornemann et al., 2023 (collective open-ended exploration, NeurIPS ALOE) realizes the hierarchy of activities of increasing complexity scoped in T3.1 (mixed cooperative-competitive scenario) through procedurally generated task trees combining five subtask types, and the population-level evaluation of emergent complexity scoped in T3.2 (evaluation of open-ended discovery): agents trained on shallow trees transfer to trees of twice the depth, to coordination requirements unseen during training, and to novel objects. Two chapters of Hamon, 2025 (Hamon's PhD thesis) extend this line — a study of the emergence of agriculture in RL-agent societies (journal paper in preparation) where agents collectively engineer their environment to improve the growth of beneficial plants, and an open-ended recipe-crafting study in which a compositional Little-Alchemy-style environment drives meta-RL agents to acquire sequential compositional strategies.
On the representation and communication sides, Nisioti et al., 2023a (autotelic RL in multi-agent environments) endows agents with a compositional goal space and shows that goal alignment between decentralized agents emerges through communication as a way to optimize cooperative tasks. Barde et al., 2022 (Architect-Builder) and Karch et al., 2022 (graphical language) further show that compositional protocols generalize across tasks, instantiating the O3 claim that compositional communication supports cooperative-skill transfer.
At the population level, Nisioti et al., 2022a (SAPIENS) adds a further structural axis: in the compositional hierarchical-innovation environment WordCraft, the topology of the experience-sharing network itself acts as a constraint on open-ended skill discovery, with dynamic topologies — agents alternating between individual or small-cluster exploration and group-level sharing — outperforming fully-connected ones.
Together, these axes contribute complementary constraints: environments shape the skill space, representations guide individual targeting, protocols support population-level transmission, and topology modulates collective innovation dynamics.
Objective O4 — the integrated demonstrator and its dissemination — is reported under Impacts and outcomes below, where Vivarium is documented as the technical realization of the demonstrator (T3.3), alongside the project's awards, outreach portfolio and teaching adoption.
The scientific evolutions described in the Context and Methods sections yielded several complementary result threads.
By pursuing the question of how environmental structure shapes cooperative behaviors (T2.3), we investigated the regime in which agents not only adapt to their environment but also actively modify it. This led to a general computational framework for non-episodic neuroevolution, first reported in Hamon et al., 2023 (GECCO): large JAX-accelerated multi-agent simulations in which agents reproduce and die based on internal physiology, without environment reset. Taylor-Davies et al., 2025 (emergent kin selection, EvoStar, best paper and best student paper awards) extends this framework to show that altruistic resource transfer between generations evolves naturally under kin recognition and population viscosity. In parallel, Nisioti et al., 2022b (plasticity and evolvability, GECCO) and Nisioti et al., 2023b (niche construction, ALIFE) establish that diverse ecological niches can favour plasticity and evolvability even in stable environments, a finding specific to the multi-niche framing. A collaboration with University Pompeu Fabra (Spain) produced Sánchez-Fibla et al., 2024 (cooperative control of environmental extremes, J. R. Soc. Interface) on group-level fire suppression and an ongoing study on the emergence of agriculture in RL-agent populations (journal paper in preparation, chapter of Hamon, 2025). The theoretical anchors for this programme are provided by Nisioti et al., 2021b (Grounding an Ecological Theory of AI in Human Evolution, NeurIPS workshop), Nisioti et al., 2021a (Grounding AI in the Origins of Human Behavior, preprint) and fully developed in the PI's HDR thesis Moulin-Frier, 2022.
The increasing interest for cultural evolution in both the AI and Artificial Life communities, combined with the advent of Large Language Models as inherently compositional communicating agents, created a new experimental paradigm connecting directly to H3. Nisioti et al., 2022a (SAPIENS, preprint) varied multi-agent experience-sharing topologies in RL and showed that dynamic network structures — where agents oscillate between individual or small-cluster innovation and group-level sharing — outperform fully-connected ones on hierarchical innovation, mirroring findings from human-laboratory studies. Nisioti et al., 2024 (Collective Innovation in Groups of LLMs, ALIFE) transposed the finding to LLM populations playing Little Alchemy 2.
Perez et al., 2025a (Telephone Game, ICLR) introduced transmission-chain experiments with LLM agents and uncovered cultural attractors in the evolution of text toxicity, positivity, difficulty and length. Perez et al., 2024 (Cultural evolution in populations of LLMs, submitted to Philosophical Transactions) released an open-source framework for manipulating cultural-evolution variables — network structure, personality, aggregation and transformation of social information — and observed punctuated cultural dynamics. Perez et al., 2023 (heterogeneous preferences) showed that preference heterogeneity reverses classical predictions about the effect of social-learning opportunities on cultural diversity.
Two further contributions address the role of intrinsic motivation within this line. Perez et al., 2025b (peer cultures, commentary on a target article in Behavioral and Brain Sciences) argues that intrinsic motivation is central to explaining the qualitative specificity of peer cultures. A cross-cultural study among BaYaka foragers and Bandongo fisher-farmers (presented as an abstract at EHBEA) documents how recent performance and recent progress influence autonomous goal selection in human populations. An ongoing contribution on the cultural evolution of human goals — under revision for a special issue of Topics in Cognitive Science on Goal Dynamics in Cognition — introduces the notion of cultural autotelic agents: individuals who generate, select, and transmit their own goals within social environments, extending the project's autotelic framework to the cultural scale.
Meta-RL emerged during the project as a productive computational model of the interplay between evolution (shaping learning architectures across environments) and development (adapting behavior within an environment). Bornemann et al., 2023 (collective open-ended exploration, NeurIPS ALOE) established its decentralized multi-agent version: agents trained by meta-RL on procedurally generated task trees acquire general cooperative exploration strategies that generalize far beyond the training distribution, even without being pushed to cooperate. Léger et al., 2024 (evolving reservoirs for meta-RL, EvoStar) studied the evolution of the underlying recurrent architectures: reservoir hyperparameters are optimized at the evolutionary scale, and the resulting reservoirs are used as state encoders for reinforcement-learning agents, letting task-specific learning leverage evolved priors. Meta-RL thus links the project's initial open-ended-cooperation line to the eco-evolutionary line that emerged during the project.
The curiosity-driven exploration algorithms central to ECOCURL — notably Intrinsically Motivated Goal Exploration Processes (IMGEPs) — proved transposable beyond the project's multi-agent scope to the automated scientific discovery of self-organized structures in complex systems. The development of curiosity-driven algorithms and large-scale ecosystem simulations realized during the project catalysed this productive research line, with seven peer-reviewed publications, a Science Advances article and the ALIFE 2023 best paper award.
Hamon et al., 2025 (Science Advances) automated the search for Lenia rules that self-organize robust agent-like structures — a primitive form of emergent multi-agent dynamics arising from raw cellular-automata substrates. Plantec et al., 2023 (Flow-Lenia, ALIFE) introduced mass conservation and parameter localization, enabling multi-species simulations in cellular automata. Three ALIFE 2025 contributions extended the toolkit: Khajehabdollahi et al., 2025 drives IMGEP exploration via vision-language-model-generated goals, Michel et al., 2025 opens ecosystemic exploration via simulation-wide metrics, and Cvjetko et al., 2025 applies autotelic reinforcement learning to the discovery process. An open-source Automated Discovery software released in 2024 packages these methods and extends them to gene regulatory networks in collaboration with Michael Levin (Tufts University).
While distant from ECOCURL in substrate — cellular automata rather than interacting agents, self-organization rather than emergent communication — this line shares the project's methodological backbone: IMGEPs, diversity search, autotelic RL and emergent multi-agent dynamics. These algorithms are therefore natural candidates for turning back onto multi-agent systems themselves, using curiosity-driven exploration to chart the behavioral spaces of the eco-evolutionary and emergent-communication simulations developed in the project. A first step in this direction was taken in Michel et al., 2025, which extends the exploration from individual patterns to ecosystem-level dynamics involving multiple interacting entities.
The project funded one PhD thesis (Gautier Hamon, defended 2025, Hamon, 2025), two research engineers (Corentin Léger, 12 months, and Eleni Nisioti, 6 months) and four Master internships. Two further PhDs co-supervised by CMF and feeding the project were defended during the period: Tristan Karch (2023, emergent-communication contributions) and Mayalen Etcheverry (2023, automated-discovery contributions). Four PhDs are ongoing — Jérémy Perez, Timothé Boulet, Marko Cvjetko and Bastien Morel — all connected to research lines opened or consolidated by the project. CMF's HDR (Moulin-Frier, 2022, 2022) formalized the ORIGINS conceptual framework on the ecology of open-ended skill acquisition, building on the project initial scope and extending it to perspectives developed below.
Academic outputs include peer-reviewed publications in major AI and Artificial Life conferences including ICLR (×2), AAMAS, CoLLAs, GECCO (×2), EvoStar (×2), ALIFE (×5) and NeurIPS workshops (×3), as well as journal articles in Nature Machine Intelligence, Science Advances, Journal of the Royal Society Interface and NeuroSci. Commentaries and perspectives submitted to special issues in Topics in Cognitive Science and Behavioral and Brain Sciences extend the project's themes into cognitive-science venues. The PI gave eight invited talks over 2022–2025 and served on the Scientific Committee of the IEEE International Conference on Development and Learning (ICDL) from 2019 to 2023.
The Vivarium simulator — a massively multi-agent 2D simulator with realistic physics, written in JAX with a Panel web interface and gRPC client-server communication — was released as the main software deliverable of the project. It is the technical realization of the integrated demonstrator of O4, pivoted from 3D to a 2D-massively-multi-agent substrate (rationale in Methods / Evolutions). It has been submitted in HAL but is still pending acceptance.
Outreach actions reached diverse audiences. The PI co-initiated and co-organized three editions of Hack1Robo (June 2023, February 2024, November 2024) with 40–80 participants per edition spanning students, hackers and companies. Hack1Robo received press coverage in Sud-Ouest, and several of its projects spawned start-up creations (e.g. Allendia, supported by Inria Startup Studio) and conference publications (Perez et al., 2025a at ICLR, following up from Perez et al., 2024). The PI also co-initiated and co-organized the SMILES workshops on SensoriMotor Interaction, Language and Embodiment of Symbols, addressing the WP4 workshop-on-emergent-communication deliverable (D4.2), together with an Inria workshop on Archaeology and AI. For general audiences, CMF gave a public talk and debate at the Médiathèque de Blanquefort in 2024, and was interviewed on La Science CQFD (France Culture, 2023) and Désassemblons le numérique (Inria podcast, 2022).
Two industrial partnerships extend the project's methods to real-world application domains. A research collaboration contract was signed with Pontos in 2024 to transfer the eco-evolutionary modelling line to sustainable-fisheries management, integrating ecosystemic and socio-cultural dynamics to better understand, predict and control halieutic systems. Under the AIRSTRIP call, the AIxIA project (accepted 2023, 18-month research-engineer position funded from 2024, with IRT Saint Exupéry and Thales) applies curiosity-driven exploration algorithms to the discovery of interference patterns in multi-core hardware architectures. Both partnerships extend the project's methodology beyond its original multi-agent RL scope.
On the teaching side, Vivarium has been adopted since 2025 as the platform for practical sessions of the "System Design, Integration and Control" course in the Master CSIC at UPF Barcelona (25 hours per year, ongoing). It was also used in the ENSC/ENSEIRB "Option Robot" course in Bordeaux to demonstrate concepts of sensorimotor control. The simulator is designed for multiple audiences — from high-school students with a code-free web interface to researchers running GPU-accelerated supercomputer simulations — making Vivarium a long-term tool for teaching and research.
The cultural-evolution and eco-evolutionary threads triggered by the project's evolutions now constitute a structured research axis in the Flowers team, articulating three connected questions on individual ↔ collective curiosity, emergent communication protocols and coordinated exploration. The Taylor-Davies et al., 2025 (emergent kin selection) contribution received the best paper award and the best student paper award at the EvoStar 2025 EvoApp track.
New collaborations structured during the evolutions are themselves part of the project's impact: IAST Toulouse with Maxime Derex (co-direction of Jérémy Perez's PhD); UPF Barcelona with Ricard Solé and Martí Sánchez-Fibla (with a UBGRS-Mob mobility grant funding Hamon's 2024 research visit); Microsoft Research NYC (Ida Momennejad) and University of Copenhagen (Sebastian Risi) on collective innovation; Google Brain Tokyo (Bert Chan) on Flow-Lenia; and the Défi Inria LLM4Code with X. Hinaut and N. Fijalkow on autotelic generative AI for program synthesis. In 2025, the PI moved to the BioTiC team at Inria Lyon, extending the eco-evolutionary simulation programme to computational biology while maintaining ongoing collaborations with Flowers on cellular-automata simulations and on biological-versus-cultural evolution.
The automated-discovery line, framed as a distant extension of the project's original multi-agent scope, turned out to produce particularly visible outcomes. Plantec et al., 2023 (Flow-Lenia) received the best paper award at ALIFE 2023. At ALIFE 2024, Flowers team contributions won the 2nd prize at the Virtual Creature Competition together with the best technical contribution award. Hamon et al., 2025 (Discovering sensorimotor agency in cellular automata) was published in Science Advances in 2025.
Public-facing dissemination of this line centered on Moulin-Frier et al., 2024 (Quand l'IA explore les prémices d'une vie artificielle), published in Pour la Science (French edition of Scientific American) in August 2024, which reached a general-science audience with the project's view of AI as a family of computational instruments for exploring the origins of evolutionary processes. The open-source Automated Discovery software release in 2024, packaging ten examples of curiosity-driven discovery methods, further extended reach to researchers outside the team.
The original scope of the project raised three classes of challenges that proved central during its execution.
Non-stationarity and decentralized coordination under intrinsic motivation. The proposal predicted that learning-progress-based exploration would provide a generic solution to the non-stationarity problem of decentralized MARL. In practice, a specific obstacle emerged: intrinsically-motivated agents sampling goals independently fail on cooperative tasks, because independent exploration does not produce the goal alignment that cooperation requires. Nisioti et al., 2023a addresses this obstacle with the Goal-coordination game, in which goal alignment emerges as a by-product of individual-reward maximization. More than a technical fix, this finding reframes communication itself: in autotelic populations, communication is the structural mechanism by which independent goal selection becomes compatible with cooperation. Practically, the non-stationarity challenge is tractable at the scale of small decentralized populations, but scaling the same principle to many-agent populations and to richer, more expressive goal spaces remains open.
Scaling the integrated demonstrator. Two risks identified in the proposal materialized: the planned Minecraft-like 3D demonstrator (T3.3) proved both poorly aligned with the scientific questions of the project and costly to scale. The response was to pivot from a few-agent 3D environment to the 2D massively-multi-agent substrate now released as the Vivarium simulator. Scaling the number of agents turned out to be more productive than adding a spatial dimension, because the questions that became central during the project — eco-evolutionary feedbacks, cultural transmission, collective innovation — depend on population size rather than on geometric richness. Reintroducing geometric richness on top of a massively-multi-agent substrate, without losing scale, remains a design and engineering challenge.
Compositional communication in grounded sensorimotor settings. The cognitive architecture of WP1 was designed to support compositional communication grounded in sensorimotor interactions (T1.3). Two contributions implement this in constrained settings: the Architect-Builder problem Barde et al., 2022 negotiates a low-level discrete protocol during learning, and the Graphical Referential Game Karch et al., 2022 yields a continuous graphical language with compositional properties. Generalizing these results to richer environments — many agents, diverse tasks, open-ended sensorimotor channels — remains a substantive challenge, particularly in comparison with the compositional richness of language in pretrained large language models.
The evolutions of the project towards eco-evolutionary dynamics, cultural evolution and LLM-based experimental paradigms raised three further challenges.
Measuring emergent complexity and open-endedness. As the project extended into Artificial Life territory, characterizing emergent complexity became methodologically central. No general measure of open-endedness is available: existing proposals such as Bedau's evolutionary activity statistics capture aspects of evolutionary novelty but do not match the open-endedness observed in natural or cultural evolution. Each contribution in the eco-evolutionary and cultural-evolution lines — Hamon et al., 2023, Taylor-Davies et al., 2025, Nisioti et al., 2022a, Perez et al., 2024 — defined its own a-posteriori metrics, designed in response to specific observed phenomena. A general methodology for detecting and quantifying phase transitions in simulated populations remains an open problem.
Computational cost of large-scale eco-evolutionary simulation. The shift towards massively-multi-agent and eco-evolutionary simulations required GPU-accelerated infrastructure (JAX, Vivarium) and brought the project methodologically closer to Artificial Life. Long non-episodic simulations — necessary to observe niche construction, eco-evolutionary feedbacks and cultural accumulation — remain costly, especially when combined with reinforcement learning or neuroevolution of large policy networks. Designing efficient, scalable simulation pipelines for these regimes is a persistent technical challenge that constrains the size and timescale of the systems that can be studied.
From disembodied LLM populations to grounded compositional communication. Large language models opened a new experimental paradigm for cultural evolution (Perez et al., 2025a, Perez et al., 2024), but the resulting agents are disembodied: their compositional competence is inherited from pre-training rather than emerging from sensorimotor interactions. This creates a tension with the grounded, bottom-up perspective of ECOCURL, where compositionality should arise from agent–environment interactions. Reconciling the two — either by grounding LLM-based agents in sensorimotor environments, or by using LLM tools as scaffolds for analyzing emergent protocols in grounded populations — is an open methodological challenge.
The application of the project's curiosity-driven algorithms to automated discovery raised two further challenges of a more general nature.
Balancing engineered structure and emergent dynamics. A recurring tension runs through the automated-discovery line: how much structure should be engineered into the system, and how much should be left to emerge? Too much engineering restricts exploration to the space of phenomena anticipated by the designer. Too little engineering makes search intractable, as the discovery algorithm has no traction on the dynamics. The project's contributions illustrate both ends of this spectrum: Hamon et al., 2025 and Plantec et al., 2023 operate on minimally-engineered cellular-automata substrates where most of the complexity emerges from local rules, while Khajehabdollahi et al., 2025 injects high-level semantic priors through vision-language-model-generated goals. Calibrating this trade-off in a principled way — and in particular designing exploration methods that can start from minimally-engineered substrates and progressively discover useful high-level structure — remains an open challenge, and a general one for open-ended AI.
Transferring curiosity-driven exploration back onto multi-agent systems. The methods developed in this line — Intrinsically Motivated Goal Exploration Processes, diversity search, autotelic reinforcement learning — were originally designed to explore the behavior of individual systems or self-organized substrates. Transferring them to multi-agent systems, where the behavioral space is shaped by agent–agent interactions, raises specific technical difficulties: the relevant behavioral descriptors must capture collective phenomena rather than individual trajectories, and the exploration must handle higher-dimensional and non-stationary dynamics. Michel et al., 2025 takes a first step by extending exploration from individual Lenia patterns to ecosystem-level dynamics with multiple interacting entities, but a general methodology for curiosity-driven exploration of multi-agent simulations — one that could automate the study of populations like those in Vivarium — remains to be developed.
A structural finding runs through several of the project's results: in autotelic populations, communication is not an added feature but the natural mechanism by which independent goal selection becomes compatible with cooperation (Nisioti et al., 2023a). Related results show that compositional environments induce cooperative exploration without explicit incentives (Bornemann et al., 2023), that the topology of experience-sharing networks shapes collective innovation (Nisioti et al., 2022a), and that compositional protocols can support cooperative-skill transfer (Barde et al., 2022, Karch et al., 2022). These results motivate a general research direction on the interactions between individual and collective curiosity: how the goal-exploration strategies of autotelic agents combine with social transmission mechanisms and group-level innovation dynamics to produce open-ended skill acquisition at the population scale.
Several open questions structure this direction. At the algorithmic level, how should individual learning-progress signals be combined with social signals — imitation, teaching, demand for information — in a way that preserves diversity while enabling accumulation? At the architectural level, what kinds of communication channels, from discrete symbolic protocols to continuous sensorimotor gestures, best support the coordination of curiosity across agents? At the population level, what network topologies and transmission biases yield sustained innovation, and how do these depend on the structure of the task space?
This programme reframes the project's core hypotheses H1–H3 at the scale of populations where individual exploration and social transmission operate simultaneously, rather than sequentially or in isolation.
Large language models have created experimental conditions for studying cultural evolution at an unprecedented scale: populations of agents with compositional communication competence, interacting through natural-language channels, at a cost that permits large controlled experiments. The project established this line with Perez et al., 2025a on cultural attractors in transmission chains, Perez et al., 2024 on the manipulation of cultural variables in LLM populations, and Nisioti et al., 2024 on collective innovation in LLM groups playing Little Alchemy 2.
A natural extension is to populations mixing humans and LLM-based agents. Information, norms and goals increasingly circulate in such hybrid ecosystems, and their cultural dynamics remain poorly understood. Relevant questions include: do hybrid populations stabilize on the same cultural attractors as pure-human or pure-LLM ones, or do new attractors emerge? How do LLMs bias the distribution of topics, values and styles transmitted at the population level? What transmission biases, network structures and aggregation rules mitigate undesirable cultural dynamics such as homogenization, polarization or degradation of information quality?
The methodological toolkit developed in the project — controlled transmission-chain experiments, large-scale simulation of LLM populations, open-source frameworks for manipulating cultural variables — applies directly here. The human-side counterpart is the cultural-evolution-of-goals work reported in the Results section (under revision for Topics in Cognitive Science), which introduces the notion of cultural autotelic agents. Taken together, these strands position Machine Culture as a research programme at the intersection of AI, cultural evolution and cognitive science.
The automated-discovery line demonstrated that curiosity-driven algorithms — originally developed for autonomous agents — are productive tools for the systematic exploration of complex dynamical systems (Hamon et al., 2025, Plantec et al., 2023, Khajehabdollahi et al., 2025, Michel et al., 2025, Cvjetko et al., 2025, Morel et al., 2025). A natural next step is to organize such algorithms into multi-agent AI science teams: populations of autotelic agents that coordinate their exploration of a shared target system, share intermediate discoveries through communication, and collaborate with human researchers on the interpretation of results.
This direction is both a concrete engineering programme and a testbed for the project's core ideas. On the engineering side, it requires coordination mechanisms for distributed exploration — how to allocate exploration effort across agents, how to share and aggregate discoveries — and metacognitive signals enabling agents to report the progress and uncertainty of their own search. On the scientific side, it closes the loop with the project's initial scope: an AI science team is itself a multi-agent autotelic population whose collective behavior instantiates the mechanisms studied in WP1–WP3. The interactive-exploration contribution Morel et al., 2025 and the goal-directed-exploration contributions cited above are first building blocks.
A further aspect is human–AI collaboration: AI science teams should be designed to integrate human guidance efficiently, both for orienting exploration towards scientifically relevant regions of the behavioral space and for leveraging human interpretation of emergent phenomena.
A long-term thread concerns the ecological and evolutionary conditions under which autotelic learning itself emerges. The eco-evolutionary contributions of the project (Hamon et al., 2023, Taylor-Davies et al., 2025, Nisioti et al., 2022b, Nisioti et al., 2023b, Sánchez-Fibla et al., 2024) established a simulation substrate in which agent populations evolve in long non-episodic scenarios with niche construction, kin selection and environmental feedback. The meta-RL contributions (Bornemann et al., 2023, Léger et al., 2024) provided a complementary computational model of the interplay between evolution and development.
The forward-looking question is how autotelic capacities — curiosity-driven exploration, intrinsic motivation to discover and master skills — themselves evolve in such populations, under what ecological pressures and what environmental variability. This extends the project's initial focus on autotelic learning from a design principle (agents given intrinsic motivation) to an emergent property (agents that evolve intrinsic motivation in response to their ecology). Preliminary evidence from the proto-intrinsic-motivation observations in Hamon, 2025 supports the tractability of this question.
A further open direction is the transfer of this simulation programme to real-world biological and ecological systems. The PI's 2025 move to the BioTiC team at Inria Lyon extends the eco-evolutionary line to computational biology, and the Pontos partnership tests transfer to halieutic systems. Framing such transfer as a scientific question in its own right — how far models developed on abstract simulation substrates can inform, predict and control real ecosystems — is itself part of the perspective.
The project's trajectory, combined with the ORIGINS framework formalized in the PI's HDR (Moulin-Frier, 2022), converges on a long-term grand challenge: the design of large-scale integrated simulations in which cultural evolution emerges from biological evolution through a sequence of Major Transitions (in the Maynard-Smith sense), yielding truly open-ended skill acquisition in silico.
The ORIGINS framework articulates three interacting levels. At Level 1, environmental complexity and multi-agent feedback drive the evolution of autotelic agents equipped with intrinsic motivation to discover new niches and skills. At Level 2, intrinsically-motivated exploration coupled with cooperation and competition pressures bootstraps a cultural repertoire — an expanding collection of behaviors transmitted through social learning, including technology, communication and social organization. At Level 3, a sufficiently rich cultural repertoire bootstraps positive feedback loops that continuously increase the complexity of the environment, the cognitive abilities of the agents, and their multi-agent dynamics, yielding genuinely open-ended skill acquisition.
Realizing this programme requires the integration of (1) complex environmental dynamics inspired by paleo-climatology and human behavioral ecology, with compositional dynamics providing diverse constraints and opportunities; (2) artificial evolution capable of generating morphogenetic and neural structures with diverse functionalities, including the potential evolution of autotelic agents; (3) reciprocal influences between agent populations and their environmental niches; (4) cross-generational influence mediated by environmental modifications and agent interactions, bootstrapping cultural transmission. Each of these ingredients is addressed partially by the project's contributions and by the four perspectives above, but their integration into a single simulation substrate remains an open long-term objective.
Beyond its computational-science motivation, this grand challenge has a humanist counterpart: the resulting simulations can serve as instruments for understanding the eco-evolutionary and cultural dynamics of the human species itself, situating AI as a family of computational tools for studying the origins of evolutionary and cultural processes.