Invited Talks

David Crandall

Indiana University, USA

Studying visual object learning with egocentric computer vision

While early work in computer vision was inspired by studies of human perception, most recent work has focused on techniques that work well in practice but probably have little biological basis. But low-cost, lightweight wearable cameras and gaze trackers can now record people's actual fields of view as they go about their everyday lives. Such first-person, "egocentric" video contains rich information about how people see and interact with the world around them, potentially helping us better understand human perception and behavior while also yielding insights that could improve computer vision. I'll describe a recent interdisciplinary project (with Chen Yu and Linda Smith) in which we used computer vision to try to characterize the properties of childrens' egocentric views as they interact with objects -- the "training data" of the child's learning system -- and then showed that injecting similar properties into the training data of computer vision algorithms could improve the algorithms' accuracies as well.

Alejandrina Cristia

PSL University, France

Using machine learning in early language acquisition research: Examples from long-form audio-recordings

In 2022, we may not have hoverboards, but we have seen artificial intelligences beat humans at go, write in the style of Shakespeare, and generate novel continuations to incomplete spoken sentences. Feats like these have, in part, been due to the rise of self-supervision machine learning techniques, in which systems are trained with vast amounts of unlabeled data. In this talk, I argue that such techniques are useful to infant researchers working in under-described languages and cultures in two key ways: First, to create classifiers that describe and annotate the vast amounts of infant-centered data we can now easily collect; and second, to build systems that potentially learn like infants do. I draw from recent work using audio recordings collected with wearables to illustrate these two avenues of work in the description of children's spoken language environment, while highlighting both opportunities and challenges, including saliently ethical and legal ones.

Rhodri Cusack

Trinity College Dublin, Ireland

Can Machine Learning Inform the Science of Infant Development, and Vice-Versa?

It can be difficult for psychologists and neuroscientists to conceptualise how cognitive functions emerge in the infant brain. Computational models may help, by providing a way to quantify the statistics of the environment, and to test the efficacy of proposed learning objectives and inductive biases. I will describe our ongoing work using deep neural networks to model the development of diverse aspects of the visual system, and the neuroimaging experiments we are running to evaluate them. Finally, I will discuss the potential for translation in the opposite direction, by reviewing how what we have learned about the development of infant cognition might inform the next generation of unsupervised machine learning.

Hana D’Souza

Cardiff University, UK

Towards embracing complexity to understand atypical development: the case of Down syndrome

Development is a complex process, involving interactions between various domains across levels of description. Yet, many of our traditional developmental paradigms aim to isolate domains. The domains measured from various tasks are then correlated in order to understand how they are connected. However, everyday experiences emerge through the complex interactions of various domains – such as motor ability, attention allocation, and the actions of other social agents. Thus, in order to understand typical and atypical development, it is crucial to embrace complexity by putting these interactions at the very core of our research. Findings from studies using this approach have been challenging fundamental assumptions about typical development. I will introduce some of our initial steps in applying this approach to atypical development (Down syndrome) and explain why it has the potential to reconceptualise our understanding of neurodevelopmental disorders.

Emmanuel Dupoux

EHESS, France

Simulating early language acquisition using self-supervised learning

Recent progress in self supervised learning opens up the way to learn probabilistic models of language from raw audio signals. We propose to use these models as a proof of feasibility of the 'statistical learning hypothesis', which states that infants bootstrap into language primarily by extracting regularities from the speech input. Similarities and differences between the developmental curves in the models and infants are presented and discussed.

Abdellah Fourtassi

Aix-Marseille University, France

“ML as a tool” vs. “ML as a model” for the study of child development in the wild

Recent improvements in Machine Learning (ML) promise to transform research in developmental psychology by allowing the quantitative study of children’s behavior outside the lab. ML can help achieve this goal in two steps: 1) automatic annotation of a target behavior from naturalistic data, and 2) quantitative prediction of this behavior from complex (possibly causal) factors. These two steps are obviously related but they diverge in the nature of the ML they call upon. In the first, ML is a “tool” whose purpose is to overcome the limitations of manual labor. In the second, ML is considered a “model” whose purpose is to mimic the child’s behavior given a similarly rich input/stimuli. In this brief talk, I will illustrate – based on ongoing research in our team about children’s early conversational development – how “ML as a tool” and “ML as a model” can be articulated to help build quantitative theories of child development in the wild.

Michael Frank

Stanford Universty, USA

Predictive models of early language learning

How can we create mechanistic models of children's early language learning? One key problem is the availability of data to train and evaluate such models. I'll present our approach to combining data from large numbers of children - inputs from CHILDES, outcomes from Wordbank - to model early vocabulary acquisition across languages. A simple regression approach allows us to combine both descriptive and model-based predictors, holding the promise of more integrative, data driven theories.

Kristen Grauman

University of Texas at Austin, USA

Visual affordances from video: learning to interact by watching people

Uri Hasson

Princeton Universty, USA

The First 1,000 Days Project

How do natural, everyday statistics in infants’ environments give rise to learning? We will introduce a big-data project, the First 1,000 Days Project at Princeton University, inspired by prior video corpora, including the Human Speechome Project and the SAYCam corpus. Our dataset is designed to video-record 20 families for 1,000 days, beginning when the family returns home after birth. Each house is wired with eight cameras and four microphones that will record for 12 hours per day. Our team is deploying (and developing) machine learning tools for automated analysis of objects, people, space, proximity, and language, including a 'baby detector' and a pipeline that can analyze 300+ years of raw video and audio data. We have completed the development and automation of the research pipeline, and data collection has started with eight families in New Jersey and eastern Pennsylvania, with five additional families waiting to start once their babies are born. Our goal is to recruit a final sample of 20 families that represents the diversity of U.S. demographics.

Felix Hill

DeepMind, UK

How language can help machines to acquire general intelligence?

Having and using language makes humans as a species better learners and better able to solve hard problems. I'll present three results that demonstrate how this can also be the case for artificial models of general intelligence. First, I'll show that agents with access to visual and linguistic semantic knowledge explore their environment more effectively than non-linguistic agents, enabling them to learn more about the world around them. Second, I'll demonstrate how an agent embodied in a simulated 3D world can be enhanced by learning from explanations -- answers to the question "why?" expressed in language. Agents that learn from both classical reinforcement and explanations solve harder cognitive challenges than those trained from RL alone. Finally, I'll present evidence that the skewed and bursty distribution of natural language may explain how large language models can be prompted to rapidly acquire new skills or behaviours. This suggests how modelling language can make a neural network better able to acquire new cognitive capacities quickly, even when those capacities are not necessarily explicitly linguistic.

Judy Hoffman

Georgia Institute of Technology, USA

The Impact of Dataset Bias on Model Learning

Computer vision relies on learning from collections of data. The mechanisms used for collecting, curating, and annotating visual data results in datasets with distinct forms of bias. In turn, models that are trained using biased data, then perpetuate that bias into their learned representations. As the world changes the particular visual appearance bias of the initial data collection may not well represent the appearances the model is expected to operate on. This discrepancy leads to reduced performance and reliability of the learned model. In strong contrast, people are able to experience a biased sample of the world yet generalize (under certain conditions) to alternative world views, like a child who can recognize an elephant at the zoo after being shown cartoon drawings of an elephant. This talk will discuss two key challenges towards producing generalizable visual learning: 1) how can we leverage the learning process to help us identify bias in our data and 2) how can we mitigate bias through modified learning protocols or by adapting to new observations as they appear?

Celeste Kidd

UC Berkeley, USA

Truth, lies, and misinformation during cognitive development

I will talk about our lab's current work-in-progress exploring interventions designed to give children a greater ability to discern truth from falsity. I will discuss some of the foundational empirical studies in progress on two types of interventions designed to facilitate children’s ability to discern fact from fiction. The first set of interventions target factors external to the child relating to the information ecosystems in which they are making judgements. The second set of interventions involve investigating internal mechanisms children may have available for helping them detect misinformed opinions. Both sets of work build off the lab's previous behavioral experiments and computational models about how children sample subsets of information from the world based on their uncertainty in order to form their beliefs and guide their subsequent sampling decisions. I will briefly provide some background on how our new work is building off of our prior papers.

Eon-Suk Ko

Chosun University, Korea

Enhancement of cues and the oddball effect in child-directed speech

People adapt their way of speaking when addressing children, and this speech register called Child-Directed Speech (CDS) is considered to provide features beneficial for infants’ language learning. I present some of these features based on Korean mothers' interaction with their children. I then raise the question about the mechanism of how such features might benefit infants’ learning given their small proportions provided in the input. I suggest that infants’ novelty-driven learning and the oddball effect might help us understand aspects of such a mechanism.

Maithilee Kunda

Vanderbilt University, USA

Studying infant-like visual category generalization using the Toybox dataset

Infants can generalize from a small number of object instances within a category to novel instances. For computer vision, this problem can be posed as a domain adaptation problem, i.e., where the distribution of data in the training dataset differs from the distribution seen at test time. However, current domain adaptation tasks and datasets do not target learning across this particular type of distribution shift. We have used our lab's Toybox dataset of handheld object manipulation videos to create a new task that mimics this learning scenario, and I will present initial work on examining how existing domain adaptation models perform on this challenging new task. I will also briefly describe two other projects that investigate how agents might learn spatial reasoning skills and theory of mind reasoning skills.

Casey Lew-Williams

Princeton University, USA

The First 1,000 Days Project

How do natural, everyday statistics in infants’ environments give rise to learning? We will introduce a big-data project, the First 1,000 Days Project at Princeton University, inspired by prior video corpora, including the Human Speechome Project and the SAYCam corpus. Our dataset is designed to video-record 20 families for 1,000 days, beginning when the family returns home after birth. Each house is wired with eight cameras and four microphones that will record for 12 hours per day. Our team is deploying (and developing) machine learning tools for automated analysis of objects, people, space, proximity, and language, including a 'baby detector' and a pipeline that can analyze 300+ years of raw video and audio data. We have completed the development and automation of the research pipeline, and data collection has started with eight families in New Jersey and eastern Pennsylvania, with five additional families waiting to start once their babies are born. Our goal is to recruit a final sample of 20 families that represents the diversity of U.S. demographics.

Jitendra Malik

UC Berkeley, USA

Learning Vision for Walking

Atsushi Nakazawa

Kyoto University, Japan

Does Affective communication increase the relation between children with ASD and their mothers?

Affective communication has the function of facilitating smooth communication. Our group have been studying a French-originated affective communication method ‘Humanitude’ which was originally developed for the nursing of dementia care. The Humanitude consists of face-to face communication (eye contact and facial expressions), touching, and talking, but there have been no studies quantifying the elements. Using computational behavioral science methods, our group have detected and analyzed the skill elements including eye contact, face-to-face communication using image recognition from first and third person video, developed and used the state-of-the-arts whole-body tactile sensor for the touch communication analysis, and developed a novel mobile facial myoelectric for facial expression recognition. As the result, our group revealed the skill elements of the methodology. Moreover, we developed the training system of the Humanitude using Augmented Reality (AR) technology which outperformed the existing communication trainings method. We will also introduce our efforts to apply this technique to improve parent-child relationships in ASD. While the experiment is preliminary, their eye contact and physical communication significantly increased after the intervention.

Pierre-Yves Oudeyer

Inria, France

Language and Culture Internalization for Autotelic Human-Like AI

Building autonomous artificial agents able to grow open-ended repertoires of skills is one of the fundamental goals of AI. To that end, a promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. However, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalization or skill composition. This perspective calls for the immersion of autotelic agents into rich socio-cultural worlds. We focus on language especially, and how its structure and content may support the development of new cognitive functions in artificial agents, just like it does in humans. Indeed, most of our skills could not be learned in isolation. Formal education teaches us to reason systematically, books teach us history, and YouTube might teach us how to cook. Crucially, our values, traditions, norms and most of our goals are cultural in essence. This knowledge, and some argue, some of our cognitive functions such as abstraction, compositional imagination or relational thinking, are formed through linguistic and cultural interactions. Inspired by the work of Vygotsky, we suggest the design of Vygotskian autotelic agents able to interact with others and, more importantly, able to internalize these interactions to transform them into cognitive tools supporting the development of new cognitive functions. This perspective paper proposes a new AI paradigm in the quest for artificial lifelong skill discovery. It justifies the approach by uncovering examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, it highlights future opportunities and challenges for Vygotskian Autotelic AI research. This presentation will be an overview of some of the ideas in this paper: https://arxiv.org/pdf/2206.01134.pdf

Marc’Aurelio Ranzato

DeepMind, UK

The Never-Ending VIsual classification Stream (NEVIS) 1.0

Intelligent agents need to constantly adapt to change; for instance they need to adapt to change in the environment or change in the computation versus accuracy trade-off . Even modern large-scale models such as large vision and language models need to constantly adapt. They not only need to adapt to the current task but also use that experience to better learn future tasks. Unfortunately, there does not exist any benchmark today which is useful to investigate the question of how to efficiently adapt and consolidate knowledge over time and at scale. In this talk, I will provide an overview of NEVIS, a new benchmark which consists of a stream of very challenging and diverse visual classification tasks. I will then discuss the preliminary results we obtained using a variety of baseline approaches. NEVIS will be released in about a month, and it is meant to motivate researchers working in continual learning, meta-learning and auto-ml to join forces and to make strides together towards the development of robust systems that can become more apt and efficient over time.

Jim Rehg

Georgia Institute of Technology, USA

Connecting 3D Shape Learning and Object Categorization

A classical topic in computer vision and psychology is the link between knowledge of 3D object shape and the ability to categorize objects. In this talk we revisit this link in two machine learning contexts that are connected to development: few-shot learning and continual learning. We show that learning a representation of 3D shape in the form of dense local descriptors provides a surprisingly powerful cue for rapid object categorization. Our shape-based approach to low-shot learning outperforms state-of-the-art models trained on category labels. We also present the first investigation of continual learning of 3D shape and demonstrate significant differences relative to continual category learning, finding that 3D shape learning does not suffer from catastrophic forgetting.

Rebecca Saxe

MIT, USA

Human infants' brains are specialized for social functions

In this talk, I will argue that human infants have distinct social representations and motivations. Infants’ learning about, and representations of, other people are not just a downstream consequence of generic processes that promote learning in the nonsocial environment, nor are they built by gradual, bottom-up adjustment to the statistics of visual experience. On the contrary, infants’ attention to people depends on specific inferences about their social relevance; and is related to activity in distinctively social brain regions.

Olivier Sigaud

Sorbonne University, France

Towards Teachable Autonomous Agents: How can developmental psychology help?

As a developmental AI researcher, I will outline a research program where we try to endow autotelic agents (agents who learn to represent, pursue and reach their own goals) with a teachability property, so that we can influence their goals through social interactions. With such agents, we can mimic guided play interactions with children, where they learn both on their own and from the guidance of a tutor or caregiver. Then I will show that such a research program faces the language grounding problem and that a central issue is the acquisition of language-sensitive sensorimotor representations. I will question existing lines of AI research related to this challenge and conclude by showing that developmental psychology research can bring a lot to address it, by providing relevant concepts, models and experimental data about it.

Linda Smith

Indiana University, USA

Why self-generated behavior has more radical consequences than you might originally think

Humans, including toddlers, are adept at taking knowledge from past experiences and using it in compelling new ways. Learning and generalization depend on both the learning machinery and the training data on which the machinery operates. This talk will highlight findings from studies of toddler’s self-generated experiences . The main point is that everyday experiences occur in time-extended episodes. Each unique episode is characterized by a suite of coherence statistics. I propose that these statistics are the secret ingredient to innovative intelligence. Moreover, they provide novel insights into the internal processes that learn, generalize and innovate.

Daniel Swingley

University of Pennsylvania, USA

Rethinking the developmental pathway of early infant language learning

Prominent empirical results of the 1980s and 1990s in which infants were revealed to have learned aspects of their language’s system of phonetic categories (consonants and vowels) contributed to a standard theoretical model in which infants first learn to perceive speech sounds, then aggregate these into possible words, and then seek to identify meanings for those words while grasping at regularities caused by grammar. Modeling approaches that are based on this pathway have shown how simple statistical heuristics computed over phoneme sequences could help point infants to the early vocabulary. I will argue that this pathway is probably wrong and that current quantitative psychological models of infant word-form discovery are misguided. I will show that infant-directed speech is too variable and too unclear for such models to be plausible characterizations, and will sketch what an alternative looks like.

Sho Tsuji

University of Tokyo, Japan

SCALa: A blueprint for computational models of language acquisition in social context

Different views on language acquisition suggest a range of cues are used, from structure found in the linguistic signal, to information gleaned from the environmental context or through social interaction. Technological advances make it now possible to collect large quantities of ecologically valid data from young children's environment, but we still lack frameworks to extract and integrate such different kinds of cues from the input. SCALa (Socio-Computational Architecture of Language Acquisition) proposes a blueprint for computational models that makes explicit the connection between the kinds of information available to the social early language learner and the computational mechanisms required to extract language-relevant information and learn from it. SCALa further allows us to make precise recommendations for future large-scale empirical research.

Ingmar Visser

University of Amsterdam, Netherlands

Visual attention development in infancy

Anne Warlaumont

UCLA, USA

Temporal patterns in vocal even sequences produced by human infants and computational vocal learning models

In recent years, my collaborators and I have analyzed the timings of when over the course of a day human infants produce vocalizations. These patterns tend to have a somewhat fractal structure, wherein vocalizations occur in clusters within clusters within clusters in time. More recently we have begun to identify relationships between how close two consecutive infant vocal events are in time and how similar they are acoustically. And we are finding that infant vocalizations also tend to be more likely to occur in quick succession in the aftermath of hearing vocalizations produced by adults. We are developing some hypotheses for why these patterns may be important for infant vocal learning. An increase in infant vocalization rate following a reward (either social or intrinsic) may be a mechanism through which human infants can gain additional practice making specific sound types, capitalizing on the current state of the relevant neural and vocal apparatus. In other words, vocalization rate is potentially a pathway to achieving acoustically targeted vocal exploration. This pathway may be particularly useful given that infants’ voluntary vocal control is limited; it may be a mechanism for bootstrapping vocal motor learning. Most computational models of vocal learning do not concern themselves with when vocalization occurs in the first place, and also don’t consider vocalization-to-vocalization patterns. I expect that some modeling approaches will be better suited than others to addressing these aspects of human vocal learning. These temporal patterns may provide a useful dimension for comparing models to human data, and prioritizing a fit along this dimension may turn out to favor more biologically realistic architectures.

Gert Westermann

Lancaster University, UK

Curiosity in infants and computational models

Much of what we know about infants' cognitive development comes from studies in which infants are passive recipients of information presented to them on a computer screen in an order and duration determined by the experimenter. While this body of work has provided us with many insights about infants' learning and their cognitive abilities, these methods ignore a fundamental aspect of real-life learning: outside the lab, infants are actively involved in their learning through exploring their environment and engaging with information in the order and duration they choose. In our lab we investigate infants' information seeking using behavioural, eye tracking, EEG and computational modelling methods. I will give a very brief overview of the methods and studies currently going on in my lab, and then describe a simple auto-encoder neural network model used to simulate intrinsically-motivated exploration that is based on maximizing in-the-moment learning progress. This model learns a stimulus set used in seminal studies of infant category learning as well as a non-curious model embedded in an optimally structured external environment.

Chen Yu

University of Texas at Austin, USA

Magnifying Time and Space: New Ways of Studying Early Development and Learning from the Infant’s Point of View

The three primary research goals in my lab are 1) to quantify the statistical regularities in the real world; 2) to examine the underlying learning mechanisms operated on the statistical data; and 3) to discover developmental pathways in a complex and multi-causal system. Toward the first goal, we have collected a corpus of infant-perspective visual scenes and infant gaze data as they play with their parent in a home-like environment. We have analyzed visual properties of infant-perspective scenes and quantified the ambiguity/transparency of individual parent naming events using infant gaze. We have also fed egocentric video to deep learning models to examine the quantity and quality of the statistical data that lead to successful learning. Toward the second goal, we have used the corpus of scenes that co-occur with parent naming to construct lab experiments which are composed of different mixes of high and low ambiguity naming events. Infants were trained and tested in multiple experimental conditions, varying in terms of the ambiguity of training trials and also in the composition and order of those trials to test specific hypotheses about statistical learning mechanisms. Toward the third goal, we have examined the social effects of joint attention in the development of the infant’s own sustained attention and identified the potentially malleable pathway through which social interactions influence the self-regulation of sustained attention. I will conclude my talk by discussing developmental dependencies among motor development, visual perception, sustained attention, joint attention, and language learning.

Andrew Zisserman

University of Oxford, UK

Audio-visual self-supervised learning

Pre/Post Doctoral Flash Talks

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Thomas Carta, INRIA

How social interaction facilitates semantic formation and differentiation of early words

Hiromichi Hagihara, University of Tokyo

Unsupervised language learning from child-centered long-form recordings

Marvin Lavechin, Ecole Normale Supérieure

Modelling bilingual language acquisition

Maureen de Seyssel, Ecole Normale Supérieure

Balancing generalisation and specificity in learning (and what's language got to do with it)

Jelena Sucevic, Oxford

Cultural priors for artificial agents: Language Models as culture models

Clement Romac, INRIA

Multi-View Self-Supervised Learning for Low-Shot Object Category Recognition

Stefan Stojanov, Georgia Institute of Technlogy

Multi-View Object Discovery and Representation Learning Facilitates Fast Mapping

Anh Thai, Georgia Institute of Technology

Page updated

Google Sites

Report abuse