This tutorial provides an overview of the LENA system and situates it within the broader landscape of longform audio recordings (LFR) for language and communication research. The session will begin with a description of the LENA system—its hardware, built-in automated analyses, and its role as a widely adopted tool in developmental and clinical research. While the tutorial focuses on LENA as the most established platform for longform recordings, alternative digital recorders will also be introduced to illustrate additional options for extended audio capture. Data processing methods will be presented with LENA’s standardized ADEX output as the reference point and compared with open-source pipelines (e.g., ALICE, Voice Type Classifier). Limitations of the system, including language-specific classification challenges and cost considerations, will be addressed and considered across device types. Applications in developmental, clinical, and cross-cultural contexts will be discussed, demonstrating how LENA and complementary approaches can be aligned with diverse research goals. By the end of the session, attendees will have a clear understanding of the structure of LENA data and other devices for longform recordings, procedures for processing them with a particular focus on LENA, and strategies for applying these resources to real research analyses.
This workshop introduces research methodologies for analyzing the dynamics of initiation in caregiver-child conversational interactions using the Language ENvironment Analysis (LENA) system. Building on recent research examining conversational turn-taking patterns and initiator effects, this workshop demonstrates how LENA technology can capture naturalistic interaction data to understand bidirectional communication patterns between caregivers and children. Participants will learn to process LENA’s Interpreted Time Segments (ITS) files, convert them to file format suitable for statistical analysis in R, and apply advanced analytical frameworks to examine conversational initiation dynamics across developmental stages and language abilities. The workshop emphasizes the practical application of automated speech analysis tools for research on early language development and social interaction patterns.
12:30 – 13:45 Lunch
LENA has become the de-facto standard for large scale, naturalistic recordings of child language environments due to its scalability, automated speaker diarization, and widespread adoption in developmental research. Its data extraction toolkit, LENA Advanced Data Extractor (ADEX), assists language researchers in a more complex data analysis beyond basic language indices. Despite its utility, however, its coarse-grained, prefixed time windows often lack the temporal granularity desirable for fine-grained behavioral analysis especially for child-directed speeches. In this tutorial paper, we present a script to convert pre-processed LENA ITS files into duration-segmented CSV format that aligns with the ADEX outputs, focusing on its segment-summarized duration data. Our pipeline supports segmentation across any temporal resolutions required by the user, depending on the study requirements. This fluidity offers greater flexibility in analyzing interactional dynamics over shorter time spans, especially when sparse events such as bookreading are concerned. We detail each step of the preprocessing and transformation process, including diarization/feature extraction, time-aligned aggregation, and validation against traditional ADEX outputs. By preserving key patterns while offering finer temporal control, this method should enable richer opportunities into language environments and child-directed speech exposure research. This tutorial will be especially valuable for researchers aiming to conduct large-scale, automated, and temporally sensitive analyses of child language input using LENA systems, while having issues on having to resort to the 5-minute window because it is the smallest you can get with the current toolkit. While our approach does not provide new analytical views on its own, it expands the analytic capacity of the current ADEX output data thereby enhancing both scalability and precision in early language research.
This tutorial introduces Natural Language Processing (NLP) algorithms and Artificial Intelligence (AI) tools for processing long-form audio data. As typical projects on speech can have datasets of up to thousands of hours of audio, researchers cannot rely only on manual annotation. Automatic speech processing tools became necessary to work with very large datasets. Moreover, reliance only on proprietary NLP and AI tools can hinder the reproducibility of experiments by a wide research community. Therefore, in this tutorial, we focus on open-source tools such as VTC (Voice Type Classifier) and ALICE (Adult LInguistic unit Count Estimator). VTC helps segment audio files into broad speaker categories, while ALICE estimates the number of linguistic units (e.g. syllables, words) produced by an adult speaker. We also present practical experiments examples using these tools on long-form audio data.
16:15 – 16:45 Small Group Discussion (breakout group by themes)