A child's environment consists not only of the linguistic input they hear and the speech they produce, but also the broader social, emotional, and contextual factors that shape their development. In recent years, advances in speech science, developmental psychology, computer science, and computational modeling have enabled opportunities to study these environments at a much larger scale than previously possible. For example, wearables are increasingly used to capture patterns of infant crying and parental affective responses throughout the day in home environments, revealing how maternal mood and mental health, such as symptoms of depression, shape moment-to-moment caregiving dynamics and ultimately influence infant social-emotional development.
This special session not only applies speech technology to developmental questions but also reveals new challenges and opportunities for advancing core speech processing methods including robust adaptation to child speech, handling naturalistic acoustic conditions with noise and overlapping speakers, developing models that account for diverse populations and languages, and creating evaluation paradigms that reflect real-world variability. With the proliferation of recording devices and AI assistants in homes, and growing interest in early identification of developmental differences, this work has direct implications for clinical practice, educational technology, and the design of speech systems that work for users across the lifespan. As daylong audio recordings become increasingly feasible and foundation models create new possibilities for large-scale analysis, we anticipate this special session will catalyze new collaborations and establish methodological standards for studying speech in naturalistic developmental contexts.
In this special session, our primary aim is to promote conversations and collaborations among researchers working in various areas of development. We welcome submissions on topics including: automatic analysis of child-directed speech and infant vocalizations; multimodal analysis of parent-child interactions; computational models of language acquisition; speech-based assessment of developmental milestones; adaptation of foundation models to child speech; analysis of naturalistic audio from wearable devices; cross-linguistic and cross-cultural studies of speech environments; methods for handling noisy, spontaneous, and overlapping speech in real-world recordings; and approaches that connect speech with other developmental domains such as cognitive development, emotional well-being, stress, and family dynamics.
This special session will be a combination of oral and poster presentations on the following topics (but not limited to):
Computational models of language learning, speech processing, or social interactions in children's everyday environments
Acoustic, prosodic, and lexical patterns in child-directed speech, child speech, or overheard speech
Analysis of parent-child interaction dynamics, including turn-taking, conversational synchrony, and contingent responses
Speech-derived indicators of emotional, social, or environmental context (e.g., distress, affect, stress, household chaos, family dynamics)
Integration of speech-related (e.g., ultrasound) and speech-adjacent (e.g., physiological) signals in multimodal analysis
Speech technology methods for investigating children's everyday environments (e.g., speech recognition, language identification, speaker diarization, emotion detection, activity segmentation, voice type classification)
Cross-linguistic and cross-cultural models of children's speech production and language environments
Methods for processing noisy, spontaneous, and overlapping speech in real-world naturalistic recordings
Development or adaptation of foundation models and self-supervised learning approaches for child speech
Longitudinal analysis of developmental trajectories and connections between speech and broader developmental outcomes (e.g., cognitive development, social-emotional development)
Clinical applications including diagnosis, assessment, intervention, and monitoring of speech and language development
Paralinguistic and non-speech vocalizations (e.g., crying, laughing, babbling) in developmental contexts
Please follow the standard INTERSPEECH paper submission guidelines available on the official website.
When submitting your paper, select “CHILDSPACE: Child Home Interaction & Language Dynamics: Speech, Psychology, Affect, Computation & Environments ” as the subject area.
Submitted papers will go through the same review process as the regular papers.
25 February 2026 -- Paper Submission Due
5 June 2026 -- Paper Acceptance Notificiation
27 September - 1 October 2026 -- Conference in Sydney, Australia
Kaveri K. Sheth (LAAC-LSCP, École Normale Supérieure, PSL, Paris, France; ksheth2019@gmail.com; main contact)
Alejandrina Cristia (LAAC-LSCP, École Normale Supérieure, PSL, Paris, France)
Beena Ahmed (University of New South Wales)
Meg Cychosz (Stanford University)
Ting Dang (The University of Melbourne)
Kaya de Barbaro (University of Texas at Austin)
Carol Espy-Wilson (University of Maryland)
Rebecca Holt (Macquarie University)
Herman Kamper (Electrical and Electronic Engineering, Stellenbosch University, South Africa)
Marvin Lavechin (MIT, CNRS from 2026)
Jialu Li (University of Arizona)
Okko Räsänen (Tampere University)
Odette Scharenborg (Delft University of Technology)
Mostafa Shahin (University of New South Wales)