Tuesday 16th September 2025
2pm - 6pm UTC+2 (CET)

4th SMILES WORKSHOP

Satellite ICDL 2025 onsite and online event

Czech Technical University (CTU), Prague

Sensorimotor Interaction, Language and Embodiment of Symbols (SMILES) workshop

Registration & Links

Online Registration: https://forms.gle/VbncefzfvBCYHtvZ9

Main ICDL conference (and onsite registration): https://icdl2025.fel.cvut.cz/

- Onsite venue: https://icdl2025.fel.cvut.cz/how-to-get-to-the-venue
Faculty of Electrical Engineering, Czech Technical University in Prague, Dejvická kampus (Technická 2)

https://maps.app.goo.gl/HLQKVZgYrF1oYjYy7

- Online venue: via Zoom (register here: https://forms.gle/VbncefzfvBCYHtvZ9 )

- Join our Discord: LINK (expires after 100 hits): https://discord.gg/MZYV25nM

- contact: smiles.conf at gmail.com

Share the event

Objectives

On the one hand, models of sensorimotor interaction are embodied in the environment and in the interaction with other agents. On the other hand, recent Deep Learning development of Natural Language Processing (NLP) models allow to capture increasing language complexity (e.g. compositional representations, word embedding, long term dependencies). However, those NLP models are disembodied in the sense that they are learned from static datasets of text or speech. How can we bridge the gap from low-level sensorimotor interaction to high-level compositional symbolic communication? The SMILES workshop will address this issue through an interdisciplinary approach involving researchers from (but not limited to):

- Sensori-motor learning,

- Symbol grounding and symbol emergence,

- Emergent communication in multi-agent systems,

- Chunking of perceptuo-motor gestures (gestures in a general sense: motor, vocal, ...),

- Compositional representations for communication and action sequence,

- Hierarchical representations of temporal information,

- Language processing and language acquisition in brains and machines,

- Models of animal communication,

- Understanding composition and temporal processing in neural network models, and

- Enaction, active perception, perception-action loop.

Invited Speakers

Paul Van Eecke

Artificial Intelligence Laboratory - Vrije Universiteit Brussel

Shreejata (Diya) Gupta

ILCB, Aix-Marseille University

Clément Romac

Hugging Face, Paris
& Inria Flowers, Bordeaux

Scientific context

Recently Deep Learning networks have broken many benchmarks in Natural Language Processing (NLP), e.g. [1]. Such breakthroughs are realised by a few mechanisms (e.g. continuous representations like word embedding, attention mechanism, ...). The brain needs to parse incoming stimuli and learn from them incrementally, it cannot unfold time like deep learning algorithms such as Back-propagation through time (BPTT). Thus, we still lack the key neuronal mechanisms needed to properly model the (hierarchies of) functions in language perception and production. Other models of language processing reproducing the behaviour of brain dynamics (Event-Related-Potentials (ERPs) [2] or functional Magnetic Resonance Imaging (fMRI) [3]) have been developed. However, such models often lack explanatory power demonstrating the causes of such observed dynamics: i.e. what is computed and why is it computed – for which purpose? We need more biologically plausible learning mechanisms while producing causal explanations of the experimental data modelled.

There is converging evidence that language production and comprehension are not separated processes in a modular mind, they are rather interwoven, and this interweaving is what enables people to predict themselves and each other [4]. Interweaving of action and perception is important because it allows a learning agent (or a baby) to learn from its own actions: for instance, by learning the perceptual consequences (e.g. the heard sounds) of its own actions (e.g. vocal productions) during babbling [5]. Thus, the agent learns in a self-supervised way instead of relying only on supervised learning, which in contrast, imply non-biological teacher signals cleverly designed by the modeller. Explicit neuronal models explaining which are the mechanisms shaping these perceptuo-motor units through development are missing.

The existence of sensorimotor (i.e. mirror) neurons at abstract representation levels (often called action-perception circuits [6]), jointly with the perceptuo-motor shaping of sensorimotor gestures [5], suggest the existence of similar action-perception mechanisms implemented at different levels of hierarchy. How could we go towards such hierarchical architectures based on action-perception mechanisms?

Importantly, a language processing model needs a way to acquire the semantics of the (symbolic) perceptuo-motor gestures and of the more abstract representations, otherwise it would consider only morphosyntactic and prosodic features of language. These symbolic gestures, i.e. signs, need to be grounded to the mental concept they are representing, i.e. the signified. Several theories and robotic experiments give examples of how symbols could be grounded or how symbols could emerge [7]. However, current neurocomputational models aiming to explain brain processes are not grounded. Robotics have an important role here for the grounding of semantics by experiencing the world through interactions with the physical world and with humans. Mechanisms that start from raw sensory perception and raw motor commands are needed to let emerge plausible representations through development, instead of arbitrary representations.

Finally, computational models of emergent communication in agent populations are currently gaining interest in the machine learning community [8], [9]. These contributions show how a communication system can emerge to solve cooperative tasks in sequential environments . However, they are still relatively disconnected from the earlier theoretical and computational literature aiming at understanding how language might have emerged from a prelinguistic substance [10], [11]. Communication shoud be conceived as the emergent result of a collective behavior optimization process and to ground the resulting computational models into the theoretical literature from language evolution research.

The SMILES workshop will aim at discussing each of the questions mentioned above, together with original recent approaches [12], [13] on how to integrate them.

[1] J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018.
[2] H. Brouwer and J.C.J. Hoeks, “A time and place for language comprehension: mapping the n400 and the p600 to a minimal cortical network,” Frontiers in Human Neuroscience, vol. 7, 2013.
[3] M.Garagnani, T.Wennekers, and F.Pulvermüller, “A neuroanatomically grounded hebbian-learning model of attention-language interactions in the human brain,” European Journal of Neuroscience, vol. 27, no. 2, pp. 492–513, Jan. 2008.
[4] M. J. Pickering and S. Garrod, “An integrated theory of language production and comprehension,” Behavioral and Brain Sciences, vol. 36, no. 4, pp. 329–347, Jun. 2013.
[5] J.-L. Schwartz, et al. “The perceptionfor-action-control theory (PACT): A perceptuo-motor theory of speech perception,” Journal of Neurolinguistics, vol. 25, no. 5, pp. 336–354, Sep. 2012.
[6] F. Pulvermüller and L. Fadiga, “Active perception: sensorimotor circuits as a cortical basis for language,” Nature Reviews Neuroscience, vol. 11, no. 5, pp. 351–360, Apr. 2010.
[7] T. Taniguchi, T. Nagai, T. Nakamura, N. Iwahashi, T. Ogata, and H. Asoh, “Symbol emergence in robotics: a survey,” Advanced Robotics, vol. 30, no. 11-12, pp. 706–728, Apr. 2016.
[8] S. Sukhbaatar et al., “Learning Multiagent Communication with Backpropagation,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, may 2016.
[9] I. Mordatch and P. Abbeel, “Emergence of Grounded Compositional Language in Multi-Agent Populations,” in Thirty-Second AAAI Conference on Artificial Intelligence, mar 2017.
[10] M. Tomasello et al., “Understanding and sharing intentions: the origins of cultural cognition,” The Behavioral and Brain Sciences, vol. 28, no. 5, pp. 675–735, 2005.
[11] P.-Y. Oudeyer and L. Smith, “How Evolution may work through Curiosity-driven Developmental Process,” Topics in Cognitive Science., p. in press, 2015.
[12] M. Schrimpf, et al., “The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing,” bioRxiv, 2020.
[13] C.Caucheteux and J.-R. King,“Language processing in brains and deep neural networks: computational convergence and its limits,” BioRxiv, 2020.

Organisers

Xavier Hinaut

Inria & Neurodegeneratives Diseases Institute
Bordeaux, France

Laura Cohen

ETIS lab, CNRS, ENSEA, CY Cergy-Paris University

Alexandre Pitti

ETIS lab, CNRS, ENSEA, CY Cergy-Paris University

ICDL 2025 Main Conference

SMILES workshop is a satellite event of the IEEE International Conference on Developmental and Learning 2025:

https://icdl2025.fel.cvut.cz/

More info

Contact
smiles.conf at gmail.com

Organisers
Xavier Hinaut, Laura Cohen, Alex Pitti

Call for abstract

- Deadline extended: August 31th

- Abstracts call: from 1/2 page to 2 pages (onsite and virtual participation are possible)

- Abstract format: same as ICDL conference https://www.ieee.org/conferences/publishing/templates.html

- Submissions: smiles.conf@gmail.com + indicate if you will be onsite or online

- Workshop dates: September 16, 2025 from 2pm to 6pm

Accepted abstracts will be asked to make a short presentation or poster for the workshop.

Archived 2020 1st SMILES workshop

2020 SMILES website:

sites.google.com/view/smiles-workshop-2020

The 1st edition had 120 registered people

Archived 2021 2nd SMILES workshop

2021 SMILES website:

sites.google.com/view/smiles-workshop-2021

The 2nd edition had 118 registered people

Page updated

Google Sites

Report abuse

Tuesday 16th September 20252pm - 6pm UTC+2 (CET)