Artificial Intelligence Laboratory, Vrije Universiteit Brussel
Humans learn language from situated communicative interactions. What about machines?
Human languages are evolutionary systems that continuously adapt to changes in the communicative needs and environment of their users. They emerge and evolve as a result of meaningful and intentional communicative interactions between members of a linguistic community. As individual language users build up their linguistic knowledge during communicative interactions that are situated in their everyday environment, their linguistic capacities are tied to the individuals' own physical and cognitive endowment, grounded in their environment, shaped by their past experiences, and motivated by their communicative needs.
When it comes to machines, the situated, communicative, and interactional aspects of language learning are often passed over. This applies in particular to today’s large language models (LLMs), where the input is predominantly text-based, and where the distribution of character groups (tokens) serves as a basis for modeling the meaning of linguistic expressions. This design choice lies at the root of a number of important limitations, in particular regarding the inequivalence between probable token sequences and utterances that are factually true, the data-hungriness of the models, and their limited ability to perform human-like logical and pragmatic reasoning.
During this talk, I will make a case for an alternative approach that models how artificial agents can acquire linguistic structures in a more human-like manner, i.e. by participating in situated communicative interactions that are grounded in the environment they perceive. Through a selection of experiments, I will show how the symbolic and subsymbolic linguistic knowledge that is captured in the resulting models is of a fundamentally different nature than the knowledge captured by LLMs, and argue that this change of perspective provides a promising path towards more human-like language processing in machines.
Further reading:
Beuls, K. & Van Eecke, P. Humans learn language from situated communicative interactions. What about Machines? Computational Linguistics 50(4): 1277-1311.
ILCB, Aix-Marseille Université, Marseille, France
Gestures as an evolutionary and developmental link to mechanistic models of language
Despite the remarkable advances of Deep Learning (DL) in Natural Language Processing (NLP), current models remain fundamentally disembodied. They learn from static text while biological agents, additionally, learn through sensorimotor interactions (such as directionality, reinforcement and meaning) with their environment. Therefore, as a natural next phase of DL model development, it is essential that we translate these biological mechanisms underlying such embodied language learning. In this context, I study gestures—a motor-based, non-verbal modality as—shared, sensorimotor-perceptual scaffolding of language development (in human infants) and evolution (across primates).
Using an ethological framework that integrates gesture function (what gestures achieve), their development (how they emerge during development), and mechanism (how they are executed)—my work shows: (i) evolutionary roots of foundational features of language (such as flexibility, intentionality and referentiality) achieved through body-based signalling from senders to receivers (directionality) in bonnet macaques (Gupta & Sinha 2016; 2019), (ii) contingent responses from adults shaping conversation-like routines (reinforcement) in preverbal infants’ use of prosocial gestures (Gupta et al 2025), and (iii) manual and bodily gestures used to achieve semantic and pragmatic intentions before speech emergence in infants, and emergence of new meaning through gestures as an embodied process (meaning-making) (Gupta et al 2024; in prep).
An evolutionary framework for embodied communication, such as the one guiding my research, can thus provide mechanisms rooted in sensorimotor-perceptual development and social context—for future mechanistic models of language acquisition in NLP and machine learning. This perspective is essential to achieve the SMILES workshop goal of bridging the gap between low-level embodied interaction and high-level compositional symbolic communication.
Institut de Neurosciences de la Timone Aix-Marseille Université, Centrale Méditerranée
Path toward compositional world learning
We outline a learning framework designed to acquire a latent space that encodes the structure of the surrounding physical environment. Grounded in simple principles of embodiment, the approach aims at developing an intuitive physics that emerges solely from the agent’s own capacity for movement and displacement. Using the example of eye saccades, the model should learn to extract stable object-in-space representations from variable visual fields generated by successive gaze shifts, provided that appropriate sparsity constraints are imposed. Such constraints would encourage the formation of a space-aware embedding, analogous to the token–positional encoding duality that underpins transformer architectures, but adapted here to the perception–action cycle.
Hugging Face (Paris) & Inria Flowers (Bordeaux)
Grounding LLMs through interactions: A curiosity-driven Reinforcement Learning approach to functional competence
Large Language Models (LLMs) have achieved remarkable success, yet they remain prone to fundamental errors in reasoning about the physical world. A key limitation lies in their passive learning paradigm: they are trained on vast text corpora without direct interaction. By contrast, theories of human language acquisition highlight the central role of embodied, active learning—through both sensorimotor experience and social interaction. This talk explores how insights from embodied theories of language learning, especially the use of intrinsically motivated interactions, can inform the next generation of LLMs. In particular, we will focus on grounding LLMs’ functional competence: their ability to use symbols to act in external environments and pursue goals. I will propose an interaction-based framework and discuss multiple empirical contributions to embedding LLMs in interactive environments, where they can explore, acquire knowledge, and ground their internal representations by solving tasks with curiosity-driven Reinforcement Learning.