Tuesday 31st August 2021
9am - 8.30pm CEST (UTC+2)


satellite ICDL 2021 online event

Sensorimotor Interaction, Language and Embodiment of Symbols (SMILES) workshop

Virtual Online Event

Registration & Links

To follow find the zoom and gather town links in the presentation slides:


Registration closed (but you can still get the video links later): https://forms.gle/tyvMdbb8UJFyXZky8

Share the event

Use the following hashtag: #SMILESworkshop

If you want to retweet: https://twitter.com/Clement_MF_/status/1432277686383255558


On the one hand, models of sensorimotor interaction are embodied in the environment and in the interaction with other agents. On the other hand, recent Deep Learning development of Natural Language Processing (NLP) models allow to capture increasing language complexity (e.g. compositional representations, word embedding, long term dependencies). However, those NLP models are disembodied in the sense that they are learned from static datasets of text or speech. How can we bridge the gap from low-level sensorimotor interaction to high-level compositional symbolic communication? The SMILES workshop will address this issue through an interdisciplinary approach involving researchers from (but not limited to):

  • Sensori-motor learning,

  • Emergent communication in multi-agent systems,

  • Chunking of perceptuo-motor gestures (gestures in a general sense: motor, vocal, ...),

  • Symbol grounding and symbol emergence,

  • Compositional representations for communication and action sequence,

  • Hierarchical representations of temporal information,

  • Language processing and acquisition in brains and machines,

  • Models of animal communication,

  • Language evolution,

  • Understanding composition and temporal processing in neural network models, and

  • Enaction, active perception, perception-action loop.

Invited Speakers

Cornell University, Aarhus University & Haskins Labs

University of Maryland, Baltimore County, USA

Gipsa lab, Grenoble-Alpes University, France

University of Texas at Austin, USA

Tel Aviv University, Israel

Saarland University, Germany

AI-lab, Vrije Universiteit Brussel, Belgium

NeuroSpin Center, Gif sur Yvette, France

University of British Columbia, Canada

Scientific context

Recently Deep Learning networks have broken many benchmarks in Natural Language Processing (NLP), e.g. [1]. Such breakthroughs are realised by a few mechanisms (e.g. continuous representations like word embedding, attention mechanism, ...). The brain needs to parse incoming stimuli and learn from them incrementally, it cannot unfold time like deep learning algorithms such as Back-propagation through time (BPTT). Thus, we still lack the key neuronal mechanisms needed to properly model the (hierarchies of) functions in language perception and production. Other models of language processing reproducing the behaviour of brain dynamics (Event-Related-Potentials (ERPs) [2] or functional Magnetic Resonance Imaging (fMRI) [3]) have been developed. However, such models often lack explanatory power demonstrating the causes of such observed dynamics: i.e. what is computed and why is it computed – for which purpose? We need more biologically plausible learning mechanisms while producing causal explanations of the experimental data modelled.

There is converging evidence that language production and comprehension are not separated processes in a modular mind, they are rather interwoven, and this interweaving is what enables people to predict themselves and each other [4]. Interweaving of action and perception is important because it allows a learning agent (or a baby) to learn from its own actions: for instance, by learning the perceptual consequences (e.g. the heard sounds) of its own actions (e.g. vocal productions) during babbling [5]. Thus, the agent learns in a self-supervised way instead of relying only on supervised learning, which in contrast, imply non-biological teacher signals cleverly designed by the modeller. Explicit neuronal models explaining which are the mechanisms shaping these perceptuo-motor units through development are missing.

The existence of sensorimotor (i.e. mirror) neurons at abstract representation levels (often called action-perception circuits [6]), jointly with the perceptuo-motor shaping of sensorimotor gestures [5], suggest the existence of similar action-perception mechanisms implemented at different levels of hierarchy. How could we go towards such hierarchical architectures based on action-perception mechanisms?

Importantly, a language processing model needs a way to acquire the semantics of the (symbolic) perceptuo-motor gestures and of the more abstract representations, otherwise it would consider only morphosyntactic and prosodic features of language. These symbolic gestures, i.e. signs, need to be grounded to the mental concept they are representing, i.e. the signified. Several theories and robotic experiments give examples of how symbols could be grounded or how symbols could emerge [7]. However, current neurocomputational models aiming to explain brain processes are not grounded. Robotics have an important role here for the grounding of semantics by experiencing the world through interactions with the physical world and with humans. Mechanisms that start from raw sensory perception and raw motor commands are needed to let emerge plausible representations through development, instead of arbitrary representations.

Finally, computational models of emergent communication in agent populations are currently gaining interest in the machine learning community [8], [9]. These contributions show how a communication system can emerge to solve cooperative tasks in sequential environments . However, they are still relatively disconnected from the earlier theoretical and computational literature aiming at understanding how language might have emerged from a prelinguistic substance [10], [11]. Communication shoud be conceived as the emergent result of a collective behavior optimization process and to ground the resulting computational models into the theoretical literature from language evolution research.

The SMILES workshop will aim at discussing each of the questions mentioned above, together with original recent approaches [12], [13] on how to integrate them.

[1] J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018.
[2] H. Brouwer and J.C.J. Hoeks, “A time and place for language comprehension: mapping the n400 and the p600 to a minimal cortical network,” Frontiers in Human Neuroscience, vol. 7, 2013.
[3] M.Garagnani, T.Wennekers, and F.Pulvermüller, “A neuroanatomically grounded hebbian-learning model of attention-language interactions in the human brain,” European Journal of Neuroscience, vol. 27, no. 2, pp. 492–513, Jan. 2008.
[4] M. J. Pickering and S. Garrod, “An integrated theory of language production and comprehension,” Behavioral and Brain Sciences, vol. 36, no. 4, pp. 329–347, Jun. 2013.
[5] J.-L. Schwartz, et al. “The perceptionfor-action-control theory (PACT): A perceptuo-motor theory of speech perception,” Journal of Neurolinguistics, vol. 25, no. 5, pp. 336–354, Sep. 2012.
[6] F. Pulvermüller and L. Fadiga, “Active perception: sensorimotor circuits as a cortical basis for language,” Nature Reviews Neuroscience, vol. 11, no. 5, pp. 351–360, Apr. 2010.
[7] T. Taniguchi, T. Nagai, T. Nakamura, N. Iwahashi, T. Ogata, and H. Asoh, “Symbol emergence in robotics: a survey,” Advanced Robotics, vol. 30, no. 11-12, pp. 706–728, Apr. 2016.
[8] S. Sukhbaatar et al., “Learning Multiagent Communication with Backpropagation,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, may 2016.
[9] I. Mordatch and P. Abbeel, “Emergence of Grounded Compositional Language in Multi-Agent Populations,” in Thirty-Second AAAI Conference on Artificial Intelligence, mar 2017.
[10] M. Tomasello et al., “Understanding and sharing intentions: the origins of cultural cognition,” The Behavioral and Brain Sciences, vol. 28, no. 5, pp. 675–735, 2005.
[11] P.-Y. Oudeyer and L. Smith, “How Evolution may work through Curiosity-driven Developmental Process,” Topics in Cognitive Science., p. in press, 2015.
[12] M. Schrimpf, et al., “The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing,” bioRxiv, 2020.
[13] C.Caucheteux and J.-R. King,“Language processing in brains and deep neural networks: computational convergence and its limits,” BioRxiv, 2020.


Inria & Neurodegeneratives Diseases Institute
Bordeaux, France

Inria & ENSTA ParisTech
Bordeaux, France

Inria, Bordeaux, France
& UCLA, Los Angeles, USA

UCLA, Los Angeles, USA

Nottingham Trent Univesity, UK

Sony AI / Sony Computer Science Laboratories Inc.
Tokyo, Japan

College of Information Science & Engineering, Ritsumeikan University, Kyoto, Japan

ICDL 2021 Main Conference

SMILES workshop is a satellite event of the IEEE International Conference on Developmental and Learning 2021:


More info

smiles.conf at gmail.com

Organisers (alphabetical order):
Xavier Hinaut, Clément Moulin-Frier, Silvia Pagliarini, Michael Spranger, Tadahiro Taniguchi, Anne Warlaumont, Joni Zhong.

Archived 2020 1st SMILES workshop

2020 SMILES website:


The 1st edition of the workshop was a great meeting with 120 registered people!