2nd Workshop on Language Learning at 2017 IEEE ICDL-EPIROB

Dates and Place

Date: Monday, September 18th 2017

Time: 9:30 - 18:00

Place: Room 02.2 at Congress Center of the Instituto Superior Tecnico (IST), in Lisbon, Portugal

This workshop is part of the 2017 IEEE ICDL-EPIROB conference, that takes place at Instituto Superior Tecnico (Lisbon, Portugal).

Schedule (tentative)

  • 9:30 Welcome
  • 9:40-10:20 Pierre-Yves Oudeyer - Curiosity-driven exploration in language development
  • 10:20-11:00 Kirsten Bergmann - Non-verbal behavior and adaptivity in language tutoring using embodied agents
  • 11:00 - 11:30 Coffee Break 1
  • 11:30-12:10 Iris Nomikou - Tuning into language: infants’ emerging participation in everyday routines
  • 12:10-12:50 Junko Kanero - Cross-linguistic approach to the universals and particulars of language acquisition
  • 12:50-13:45 Lunch
  • 13:45-14:30 Max Garagnani - Simulating word learning and high-frequency brain responses to speech items in a neurobiologically realistic model of the left perisylvian cortex.
  • 14:30-15:15 Angelo Cangelosi - Active Learning and Embodiment in Robot Language
  • 15:15-16:00 Takayuki Nagai - Symbol emergence in robotics for future human-robot communication and collaboration
  • 16:00-16:10 Poster teases
  • 16:00-16:30 Coffee Break 2 + poster session
  • 16:30- Poster session


Children acquire language by interacting with their social and physical environment. When children start to talk using language, their sensory-motor intelligence, visual and auditory perception, body movement, navigation, object manipulation, and articulatory control, has already developed a high level of competence. Together with social support, these competences and growing representations provide a basis for the ongoing development of communication. Emerging skills such as basic turn-taking, establishing eye-contact and first systematic vocalizations like canonical babbling significantly shape early social interactions.

These interactions are multimodal in nature and vary across contexts. Especially early communicative exchange can be characterized by the fact that different, not necessarily conventional, means are applied. This fact is intriguing for the research on symbol emergence: How do participants choose and agree on particular means? How do the means become conventionalized?

Concerning the context, within which interaction takes place, it can vary not only across developmental time and situations within individuals, but also between individuals, socio-economic groups and cultures. Continuously, representations become further enriched in ongoing interactions and across different contexts.

Importantly, continuously acquiring knowledge in different multimodal contexts and being able to continuously enrich the underlying representations provides a potential powerful mechanism (cross-situational learning) which is already well recognized in learning in children. Nonetheless, we need to know more about how children recognize contexts and how their language learning benefits from different language use varying across contexts.

Even though there are various efforts in developmental robotics to model communication, the emergence of symbolic communication is still an unsolved problem. We are still lacking convincing theories and implementations that show how cooperation and interaction skills could emerge in long-term experiments with populations of robotic agents.


The workshop addresses the emergence of communication, which requires combining and integrating knowledge from diverse disciplines: developmental psychology, developmental linguistics, robotics, artificial language evolution, complex systems science, computational linguistics, neuroscience, and machine learning. We bring together researchers from these areas in order to discuss current findings from experimental studies, to transfer insights into potential mechanism for modeling approaches, as well as inducing new informed experiments and stimulate interaction between the fields.

Of particular interest is the guiding question of how can we build learning mechanisms that embrace the complexity of information and its variability across contexts?

The workshop is organized along the following objectives:

  • Set up a roadmap for multidisciplinary research exploiting multimodal language learning across contexts. Our discussions will focus on the notion of “context” in order to specify some parameters that broaden our understanding of how these may be constructed (or emerge) or how these may be utilized in, for instance, cross-situational learning.
  • Provide an integrated and multidisciplinary perspective on multimodal language learning. The workshop will bring together complementary perspectives towards learning across modalities and will allow for ample time for discussions in order to relate findings from different disciplines.
  • The ability to cooperate as well as to communicate is assumed to rely on rich embodied representations. One of the key objectives of the workshop will be to understand possible joint representations (e.g., sensorimotor schemas and constructions). In particular the workshop will focus on the overlap in these representations and on the question how such representations can serve different tasks as in motor control or when recruited in communication. Secondly, how are these representations interacting in different stages of development?
  • The workshop will examine cognitive architectures for learning and acquisition strategies with a special emphasis on architectures that allow modelling the interplay of different strategies on all levels of development e.g. the acquisition and evolution of interaction patterns, shaping and change of word meanings, schematisation of constructional knowledge, the acquisition and use of gestures in communication, etc.

Invited Speakers

Afra Alihashi (U Tilburg, The Netherlands) - Representations of language in a model of grounded speech

Language learners rely on perceptual input to associate linguistic form with grounded meaning, but the perceptual context of language learning is often noisy and highly ambiguous. I will present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease. Specifically, we study the representation and encoding of phonemes in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retained at the top recurrent layer.

Angelo Cangelosi (U Plymouth, UK) - Active Learning and Embodiment in Robot Language

Growing theoretical and experimental research on action and language processing and on number learning and gestures clearly demonstrates the role of embodiment in cognition and language processing. In psychology and neuroscience this evidence constitutes the basis of embodied cognition, also known as grounded cognition (Pezzulo et al. 2012; Borghi & Cangelosi 2014). This is complemented by active learning in developmental models of cognitive skills acquisition. In robotics, these studies have important implications for the design of linguistic capabilities in cognitive agents, e.g. for human-robot communication, and have led to the new interdisciplinary approach of Developmental Robotics (Cangelosi & Schlesinger 2015). During the talk we will present examples of developmental robotics models and experimental results from iCub experiments on the embodiment biases in early word acquisition and grammar learning (Morse et al. 2015; Morse & Cangelosi 2017) and experiments on pointing gestures and finger counting for number learning (De La Cruz et al. 2014). The contribution of active learning in language acquisition, using intrinsic motivation strategies, will also be considered (Antunes et al. 2017). The implications for the use of such embodied approaches for symbol grounding and for robot companion applications will also be discussed.

Iris Nomikou (U Portsmouth,UK) - Tuning into language: infants’ emerging participation in everyday routines

In the first months of life, infants become active participants in interactions, able to co-create actions with their caregivers. In this presentation, I will suggest that the structure of the social environment and infants’ multimodal engagement in routines are crucial factors in this development. Using a mixed-methods approach combining both detailed qualitative analyses of individual cases as well as global dynamic measures of coupling in various modalities, I will draw from naturalistic longitudinal video corpora of mother-infant interactions in the first 8 months of infants’ life. I will show how early forms of mutuality arise within mother-infant interactions as part of coordinating with each other (Nomikou et al., 2016a). This coordination acts as a coupling mechanism, enabling mother and infant to enter an interaction and maintain or stabilize it (Rączaszek-Leonardi et al., 2013). I will suggest that within repeated interactions infants learn that not any action, but very specific actions are expected (Nomikou et al., 2016b; Rohlfing et al. 2016). As such, I propose they are the “vehicle” towards language (Bruner, 1985, p.39). What starts off as an embodied experience of language as acting with each-other (Ochs, 2012 ; Raczaszek-Leonardi et al., 2013) becomes increasingly conventionalized as interactions become less idiosyncratic and more rule-like and, thus, recognizable to a broader cultural community.

Junko Kanero (Koç University, Turkey) - Cross-linguistic approach to the universals and particulars of language acquisition

Discussion concerning language acquisition often focuses on the universal patterns exhibited by learners of any language in the world. Equally important, however, are how infants and children discover specific ways in which their native tongue carves up the world, and how they adjust themselves to the language-specific structure. At birth, an infant presumably has no knowledge about the semantic organization of the particular language she is going to learn. For every word she hears, there are infinite possible meanings. Instead of considering every possible meaning, she must acquire strategies to efficiently learn new words, phrases, and sentences. My research takes a cross-linguistic approach and explores how children learn to express the world of objects, agents, and events in their native tongue. In this approach, I first identify a specific difference between semantic structures of two languages, and then examine the process in which children who are learning one of the two languages come to grasp the language-specific structure. This talk concerns not only how children learn to associate a novel word with its meaning, but also how they learn to express events in sentences. Specific topics I will use to illustrate the process include, but not limited to, verb learning (“How do children infer the meanings of novel motion verbs?”) and causal language (“How do children learn to express dynamic events involving cause and effect?”). Using these examples, I aim to have a broad discussion on what knowledge children are equipped with when they start to learn language, what adjustments they need to make in order to efficiently learn and use a specific language, and when and how the adjustments happen. I suggest that cross-linguistic comparison allows us to examine both the universal and particular patterns in the world’s languages, and consequently, of language acquisition.

Kirsten Bergmann (UBielfeld, Germany) - Non-verbal behavior and adaptivity in language tutoring using embodied agents

Using embodied agents for language tutoring enables use to make particular use of bodily behavior to scaffold learning. Especially gestures have a high potential to support learners in the acquisition of new vocabulary. I will present a series of studies investigating the supporting function of gestures for vocabulary acquisition suggesting that gestures need to be employed in accordance with learner’s individual profile as well as the linguistic material to be learned. In addition, I will present work on the individualization of tutoring investigating the effects of child-directed synthetic speech as well as adaptive tutoring by means of Bayesian Knowledge Tracing to personalize the order of content to be addressed.

Max Garagnani (Goldsmith, UK) - Simulating word learning and high-frequency brain responses to speech items in a neurobiologically realistic model of the left perisylvian cortex.

I will highlight a neural architecture that we developed to simulate and explain cortical correlates of word learning and semantic grounding in the human brain. The model’s main distinguishing features are (i) to closely replicate connectivity and anatomical structure of left-hemispheric cortical areas known to be relevant for language processing, and (ii) to implement only functional mechanisms that reflect known cellular- and synaptic-level properties of the cortex.

Appropriate “sensorimotor” stimulation of the network (mimicking early stages of word acquisition) leads to the spontaneous formation in the network of model correlates of memory traces for words (i.e., distributed cell-assembly circuits exhibiting non-linear and oscillatory dynamics). I will then show that, without any significant changes, this neural architecture goes a long way towards explaining a range of experimental data and phenomena in language as well as other domains, pointing to a unifying model of cognition based on action-perception circuits whose emergence, dynamics and interactions are grounded in known neuroanatomy and neurobiological mechanisms.

Pierre Yves Oudeyer (INRIA Bordeaux, France) - Curiosity-driven exploration in language development

Takayuki Nagai (U Electro-Communications, Tokyo) - Symbol emergence in robotics for future human-robot communication and collaboration

Intelligence is deeply dependent on its physical body, and its development requires interaction between its body and surrounding environment. However, it is still an open problem that how we can integrate the lower level motor control and a higher level symbol manipulation system (language). One of our research goals is to make a computational model of human intelligence from the motor control to the higher level symbol manipulation. In this talk, an unsupervised on-line learning algorithm, which uses a nonparametric Bayesian framework for categorizing multimodal sensory signals such as audio, visual, and haptic information by robots, is introduced at first. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The basic algorithm for intelligence is to categorize the collected multimodal data so that the robot can infer future better and we call the generated categorizes as multimodal concepts. The latter half of this talk discusses an integrated computational model of human intelligence from the motor control to the higher level cognition. Again, the core idea is to segment and categorize multimodal data in various levels and to use such multimodal concepts for inference. Our claim here is that the integrated computational model of human intelligence can be built based on the idea of multimodal categorization.


The workshop is planned is a full day workshop with three sessions of three talks each (45 minutes incl discussion each). We invited a total of 9 senior researchers (of which 7 will bee able to attend and give a talk).

The workshop finishes with a poster session (at the end and during the first coffee break). We encourage the submission of poster abstracts for the workshop. We want to give young people a chance to present their (ongoing) work. But we also want to provide a forum for relevant work that has recently been published in journals and other conferences. Abstracts will be reviewed by the organizers. Suitable posters will be invited to submit their work to an upcoming special issue of Frontiers in Neurorobotics.

Call for Abstracts

We invite the submission of abstracts (anywhere between 200 words to 2 pages) related to all aspects of language learning. Accepted abstracts are presented in a poster session. We particularly encourage researchers to submit abstracts of ongoing work, as well as recently published work related to all aspects of language learning in artificial systems and humans.

Abstracts are reviewed for suitability by the organizers. If in doubt about whether your poster is in scope for the workshop, please contact us directly.

Publication: Suitable posters will be invited to submit their work to an upcoming special issue.

Submission: two page abstracts

Deadline: September 15th, 2017 (notifications of acceptance are send out within about 24)

Please send you abstracts to languagelearningcontact@gmail.com


Chen Yu (Indiana University, USA)

Katharina J. Rohlfing (Paderborn University, Germany)

Malte Schilling (CITEC Bielefeld, Germany)

Michael Spranger (Sony Computer Science Laboratories Inc, Japan)

Paul Vogt (Tilburg University, the Netherlands)

Tadahiro Taniguchi (Ritsumeikan, Japan)


If you have any questions, comments or feedback, please contact Michael Spranger at michael [dot] spranger [at] gmail [dot] com