Tuesday 16th September 2025
2pm - 6pm UTC+2 (CET)

4th SMILES WORKSHOP

satellite ICDL 2025 onsite and online event

Czech Technical University (CTU), Prague

Sensorimotor Interaction, Language and Embodiment of Symbols (SMILES) workshop

Registration & Links

Registration: https://forms.gle/No8xFFkmfKXXdNMm7 (if you come onsite, you also need to register here)

Join our Discord: https://discord.gg/sBZbdSWRvJ (expires after 100 hits)

Main ICDL conference (and onsite registration): https://icdl2022.qmul.ac.uk

- Onsite venue: Graduate Center, Queen Mary University of London, UK. More info here: https://icdl2022.qmul.ac.uk/?page_id=103

- Online venue: via Zoom and Discord group.

- contact: smiles.conf at gmail.com

Share the event

Objectives

On the one hand, models of sensorimotor interaction are embodied in the environment and in the interaction with other agents. On the other hand, recent Deep Learning development of Natural Language Processing (NLP) models allow to capture increasing language complexity (e.g. compositional representations, word embedding, long term dependencies). However, those NLP models are disembodied in the sense that they are learned from static datasets of text or speech. How can we bridge the gap from low-level sensorimotor interaction to high-level compositional symbolic communication? The SMILES workshop will address this issue through an interdisciplinary approach involving researchers from (but not limited to):

- Sensori-motor learning,

- Symbol grounding and symbol emergence,

- Emergent communication in multi-agent systems,

- Chunking of perceptuo-motor gestures (gestures in a general sense: motor, vocal, ...),

- Compositional representations for communication and action sequence,

- Hierarchical representations of temporal information,

- Language processing and language acquisition in brains and machines,

- Models of animal communication,

- Understanding composition and temporal processing in neural network models, and

- Enaction, active perception, perception-action loop.

Invited Speakers

Joanna Raczaszek-Leonardi

University of Warsaw, Poland

Nested timings of coaction in mother-infant interaction

Joel Z Leibo

DeepMind, London, UK

Reverse-Engineering Human Evolution with Multi-Agent Reinforcement Learning

Lorijn Zaadnoordijk

Trinity College Dublin, Ireland

Developing development:
Of babies and machines

Martin Butz

Neuro-Cognitive Modeling,
Tübingen, Germany

Towards Linking Event-Predictive, Sensorimotor Grounded Structures to Language

Chris Gumbsch

Neuro-Cognitive Modeling,
Tübingen, Germany

Towards Linking Event-Predictive, Sensorimotor Grounded Structures to Language

Ida Momennejad

Microsoft Research NYC, USA

The Graph Structure of Collective Cognition and Innovation in Humans and Machine

Scientific context

Recently Deep Learning networks have broken many benchmarks in Natural Language Processing (NLP), e.g. [1]. Such breakthroughs are realised by a few mechanisms (e.g. continuous representations like word embedding, attention mechanism, ...). The brain needs to parse incoming stimuli and learn from them incrementally, it cannot unfold time like deep learning algorithms such as Back-propagation through time (BPTT). Thus, we still lack the key neuronal mechanisms needed to properly model the (hierarchies of) functions in language perception and production. Other models of language processing reproducing the behaviour of brain dynamics (Event-Related-Potentials (ERPs) [2] or functional Magnetic Resonance Imaging (fMRI) [3]) have been developed. However, such models often lack explanatory power demonstrating the causes of such observed dynamics: i.e. what is computed and why is it computed – for which purpose? We need more biologically plausible learning mechanisms while producing causal explanations of the experimental data modelled.

There is converging evidence that language production and comprehension are not separated processes in a modular mind, they are rather interwoven, and this interweaving is what enables people to predict themselves and each other [4]. Interweaving of action and perception is important because it allows a learning agent (or a baby) to learn from its own actions: for instance, by learning the perceptual consequences (e.g. the heard sounds) of its own actions (e.g. vocal productions) during babbling [5]. Thus, the agent learns in a self-supervised way instead of relying only on supervised learning, which in contrast, imply non-biological teacher signals cleverly designed by the modeller. Explicit neuronal models explaining which are the mechanisms shaping these perceptuo-motor units through development are missing.

The existence of sensorimotor (i.e. mirror) neurons at abstract representation levels (often called action-perception circuits [6]), jointly with the perceptuo-motor shaping of sensorimotor gestures [5], suggest the existence of similar action-perception mechanisms implemented at different levels of hierarchy. How could we go towards such hierarchical architectures based on action-perception mechanisms?

Importantly, a language processing model needs a way to acquire the semantics of the (symbolic) perceptuo-motor gestures and of the more abstract representations, otherwise it would consider only morphosyntactic and prosodic features of language. These symbolic gestures, i.e. signs, need to be grounded to the mental concept they are representing, i.e. the signified. Several theories and robotic experiments give examples of how symbols could be grounded or how symbols could emerge [7]. However, current neurocomputational models aiming to explain brain processes are not grounded. Robotics have an important role here for the grounding of semantics by experiencing the world through interactions with the physical world and with humans. Mechanisms that start from raw sensory perception and raw motor commands are needed to let emerge plausible representations through development, instead of arbitrary representations.

Finally, computational models of emergent communication in agent populations are currently gaining interest in the machine learning community [8], [9]. These contributions show how a communication system can emerge to solve cooperative tasks in sequential environments . However, they are still relatively disconnected from the earlier theoretical and computational literature aiming at understanding how language might have emerged from a prelinguistic substance [10], [11]. Communication shoud be conceived as the emergent result of a collective behavior optimization process and to ground the resulting computational models into the theoretical literature from language evolution research.

The SMILES workshop will aim at discussing each of the questions mentioned above, together with original recent approaches [12], [13] on how to integrate them.

[1] J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018.
[2] H. Brouwer and J.C.J. Hoeks, “A time and place for language comprehension: mapping the n400 and the p600 to a minimal cortical network,” Frontiers in Human Neuroscience, vol. 7, 2013.
[3] M.Garagnani, T.Wennekers, and F.Pulvermüller, “A neuroanatomically grounded hebbian-learning model of attention-language interactions in the human brain,” European Journal of Neuroscience, vol. 27, no. 2, pp. 492–513, Jan. 2008.
[4] M. J. Pickering and S. Garrod, “An integrated theory of language production and comprehension,” Behavioral and Brain Sciences, vol. 36, no. 4, pp. 329–347, Jun. 2013.
[5] J.-L. Schwartz, et al. “The perceptionfor-action-control theory (PACT): A perceptuo-motor theory of speech perception,” Journal of Neurolinguistics, vol. 25, no. 5, pp. 336–354, Sep. 2012.
[6] F. Pulvermüller and L. Fadiga, “Active perception: sensorimotor circuits as a cortical basis for language,” Nature Reviews Neuroscience, vol. 11, no. 5, pp. 351–360, Apr. 2010.
[7] T. Taniguchi, T. Nagai, T. Nakamura, N. Iwahashi, T. Ogata, and H. Asoh, “Symbol emergence in robotics: a survey,” Advanced Robotics, vol. 30, no. 11-12, pp. 706–728, Apr. 2016.
[8] S. Sukhbaatar et al., “Learning Multiagent Communication with Backpropagation,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, may 2016.
[9] I. Mordatch and P. Abbeel, “Emergence of Grounded Compositional Language in Multi-Agent Populations,” in Thirty-Second AAAI Conference on Artificial Intelligence, mar 2017.
[10] M. Tomasello et al., “Understanding and sharing intentions: the origins of cultural cognition,” The Behavioral and Brain Sciences, vol. 28, no. 5, pp. 675–735, 2005.
[11] P.-Y. Oudeyer and L. Smith, “How Evolution may work through Curiosity-driven Developmental Process,” Topics in Cognitive Science., p. in press, 2015.
[12] M. Schrimpf, et al., “The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing,” bioRxiv, 2020.
[13] C.Caucheteux and J.-R. King,“Language processing in brains and deep neural networks: computational convergence and its limits,” BioRxiv, 2020.