Human Language Acquisition through Machine Learning and AI

3rd POSTECH MINDS-Chosun CDS Joint Workshop

2026/01/22 (Thursday) - 2026/01/23 (Friday) | Jeju Island, South Korea

Workshop Rationale

Building on the success of the 2024 conference on learning in humans and machines collaboratively organized by POSTECH MINDS and Chosun CDS, this 2026 joint workshop by POSTECH MINDS and Chosun CDS broadens its focus to foster meaningful dialogue and collaboration between the humanities and sciences. This interdisciplinary platform encourages participants to exchange ideas, share methodologies, and explore collaborative opportunities. The workshop promotes open-ended discussions that inspire innovation and cross-disciplinary connections. The goal is to create a dynamic environment for creative thinking and partnerships that advance both research and practical applications.

Organizers

Jae-Hun Jung (POSTECH MINDS; Mathematics; Graduate School of AI)
Eon-Suk Ko (Chosun University CDS; Department of English Language and Literature)
Rajalakshmi Madhavan (Chosun University CDS)
Ioana Buhnila (Chosun University CDS)

Register

Please register to the workshop using the QR or the following link.
Registration is free of charge, but mandatory.

Program

DAY 1 | JANUARY 22, Thursday

TIME

PROGRAM

SPEAKER

08:30 - 09:00

Registration & Breakfast

09:00 - 09:05

Welcome Remarks

Eon-Suk Ko
(Chosun University)

09:05 - 10:05

Information Theory and Entropy Applied to Art and Language

Information theory provides a useful framework for analyzing and processing semantic data by quantifying uncertainty and communication efficiency. In this talk, we explore how information concepts can be applied to semantic domains such as music and literary texts. In particular, we demonstrate how musical data can be transformed into network representations and analyzed using entropy-based measures. We also discuss how information theory can be employed to distinguish between human-generated and AI-generated data. This talk is intended as an introductory presentation and does not assume any prior background knowledge in information theory or related fields.

Jae-Hun Jung
(POSTECH)

10:05 - 11:05

Entropy and Learnability in Child-Directed Speech

Infants learn language with remarkable efficiency, raising the question of whether this success reflects special properties of child-directed speech (CDS) or powerful learning mechanisms. Focusing on Korean, this talk shows that several advantages attributed to CDS, including improved word segmentation and phonological clarity, largely arise from structural and distributional properties of the input rather than from register itself. I then introduce an entropy-based analysis of Korean morphology to show how grammatical variability is concentrated on high-frequency verbs, creating a learning environment in which repetition and variation jointly support abstraction. Together, the findings suggest that CDS supports language learning not by simplifying the input, but by organizing it in a way that makes variability learnable.

Eon-Suk Ko
(Chosun University)

11:05 - 11:15

Coffee Break

11:15 - 12:15

Tutorial 1: A Guide to Efficient Video Data Annotation using Deep Learning

While video data capturing children’s behavior is more accessible than ever, extracting insights from the video remains a challenge due to the intensive nature of frame-by-frame annotation. This tutorial provides a practical roadmap for researchers. We will explain the characteristics of video data, explore use cases in child development, and introduce open-source tools for efficient annotation. We conclude by sharing a pipeline for developing automated annotation systems for headturn preference procedure (HPP) experiments. The tutorial code is available at https://m.site.naver.com/1ZBYy

Dong-Jin Lee
(POSTECH)

12:15 - 13:45

Lunch Break

13:45 - 14:45

Tutorial 2: Whisper Tools for Long-form Audio Data Processing

Long-form audio data processing is a challenging task, as audio files can contain different types of speech (adult, child, infant speech), background noice, and silence. Recording and processing software such as LENA (Gilkerson & Richards, 2020) allows processing audio data. However, the tool is closed-source and offers a limited number of processing tasks. LENA proposes labels for speech diarization (who talks and when), but not for the transcription of the audio file into textual data. Transcription is an important step for various NLP tasks, such as morpho-syntactic and sentiment analysis. This tutorial attempts to bridge this gap by presenting preliminary experiments using open-source audio processing NLP and AI tools such as Whisper and WhisperX. We explored the challenges of applying these tools to Korean speech data and we present first results.

Ioana Buhnila
(Chosun University)

14:45 - 15:45

Infant Vocal Development: A Focus on Canonical Babbling

Jongmin Jung
(Chosun University)

15:45 - 16:00

Coffee Break

16:00 - 17:00

Small babies, big data: decoding early vocalizations with speech technology (online)

Babbling is a critical milestone during which infants explore moving their articulators to produce increasingly complex sounds before producing their first words around 12 months. As of today, studying vocal development requires the costly and labor-intensive process of manually transcribing children's linguistic productions, which often limits research to small sample sizes. In this talk, I’ll present my ongoing efforts to build a fully automated speech processing pipeline to enable large-scale studies of vocal development from naturalistic day-long recordings. The talk follows three stages of inquiry. First, detecting when children vocalize - a prerequisite for studying what they say (past work). Second, characterizing the sounds they produce (current work). Third, measuring when language-specificity emerges in their vocalizations (future work). With this pipeline, I hope to address two long-standing questions: To what extent do individuals follow shared trajectories in phonological development? And when do infants start exhibiting distinct babbling patterns across different languages? Beyond theoretical implications, these insights will also allow investigation of early vocal markers associated with atypical language development.

Marvin Lavechin
(MIT)

17:00 - 17:20

Passive Voice Production in Korean Learners’ L3 English: A Corpus-Based NLP Study of Cross-Linguistic Influence (online)

This study explores the Cross-Linguistic Influence (CLI) phenomenon affecting the production of the passive voice in Korean learners’ English writing. Two primary hypotheses are investigated to determine CLI effects on English as the L3: (a) the Linguistic Proximity Model (LPM), which proposes that both facilitative and non-facilitative transfer occurs based on property-by-property structural similarity between the L2 and L3; and (b) the Cumulative Enhancement Model (CEM), which predicts that language transfer is exclusively facilitative via a linguistic reservoir, regardless of the typological distance between the L2 and L3. To address the inconsistent findings of both hypotheses often associated with small-scale experiments, this study employs a data-driven approach using the Gachon Learner Corpus, comprising 24,588 argumentative essays (2,983,829 tokens) written by 2,459 Korean English learners. A hybrid NLP tool utilizing the spaCy algorithm was developed to detect both erroneous and correct instances of the passive voice across three groups: (a) L1 Korean-L2 English; (b) L1 Korean-L2 (typologically close to English)-L3 English; and (c) L1 Korean-L2 (typologically distant from English)-L3 English. Statistical analyses (ANOVA) will compare error patterns across these groups to test the proposed hypotheses. This work contributes to L3 Acquisition studies, Learner Corpus Research, and Korean EFL pedagogy.

Alif Rosyidah
(Columbia University)

17:20 - 17:40

Testing Infants’ Word Comprehension Online: Carrier Phrase Matters (online)

Online testing offers a promising solution to challenges in infant research, such as limited sample sizes, low statistical power, and restricted population diversity. However, adapting time-sensitive paradigms such as preferential looking to online environments raises important methodological questions. The present study investigates whether infants’ performance in an online preferential looking task depends on the linguistic context preceding a target word. Using an online Looking-While-Listening paradigm administered via Lookit, infants aged 1;0–2;0 years (N = 46) were tested on their recognition of familiar nouns presented in three conditions: embedded in a sentence frame, preceded by an article, or presented in isolation. Overall accuracy was above chance, with significantly higher accuracy in the sentence condition compared to the article and word conditions. Accuracy increased with age, and only infants older than 1;6 years performed reliably above chance. These findings indicate that carrier phrases play an important role in supporting infants’ word recognition in online preferential looking tasks and suggest that successful online implementations of this paradigm benefit from the use of full sentence frames and from testing older infants.

Sarah der Nederlanden
(University of Amsterdam)

17:40 - 18:00

Open Discussion & Collaboration

18:00 - 19:30

Dinner

DAY 2 | JANUARY 23, Friday

TIME

PROGRAM

SPEAKER

08:30 - 09:00

Registration & Breakfast

09:00 - 09:20

Acoustic Correlates of Speech Rhythm Development in Brazilian Portuguese (online)

Brazilian Portuguese (BP) is characterized by a mixed rhythmic pattern, combining properties of both stress-timed and syllable-timed languages. This study investigates the developmental trajectory of speech rhythm in 31 BP-speaking children and adolescents aged 6 to 17. Using speech data obtained from storytelling activities, recordings were annotated for vowel-to-vowel (VV) units (considered the minimal units of rhythm perception) and analyzed using linear mixed-effects models. Results indicate that the relationship between the number of VV units and stress group duration varies systematically across age groups. Children aged 6 to 10 exhibit steeper slopes, reflecting a greater syllabic sensitivity where each additional unit contributes substantially to the overall duration of accentual groups. Conversely, adolescents aged 12 to 17 show a consistent reduction in slope magnitude, indicating that accentual group duration becomes progressively less dependent on the number of VV units. This developmental shift suggests a move toward a more stress-timed rhythmic organization. Furthermore, while no clear clustering by sex was observed , individual speaker trajectories seem to converge with age, likely reflecting advances in motor control and more efficient integration between linguistic planning and articulatory execution.

Maria Orfanelli
(Campinas State University)

09:20 - 09:40

Prosodic Modulation in Child-Directed Storytelling: Phonetic Evidence from Igbo (online)

Most research on child-directed speech (CDS) is based on Western, literacy-centered contexts, leaving limited understanding of how caregivers adapt prosody for children in tonal languages within strongly oral societies. This study examines prosodic adaptation in Igbo, a tonal language in which storytelling is a primary mode of cultural and linguistic transmission, focusing on how child-directed communication is shaped under tonal constraints. We compared adult-directed speech (ADS) and CDS across two contexts, everyday conversation and storytelling, using naturalistic recordings from 10 Igbo-speaking caregivers interacting with infants aged 12–24 months. Measures included mean fundamental frequency, pitch variability, articulation rate, and vowel duration. Results show that CDS exhibits a higher pitch register and slower articulation rate than ADS across both contexts, without expansion of pitch variability. This indicates prosodic reorganization rather than exaggeration. Vowel duration was also increased in CDS, consistent with global temporal slowing. Storytelling further amplified these prosodic adjustments, with in-character speech showing additional register shifts that index discourse roles rather than lexical contrasts. These findings suggest that universal CDS tendencies extend to Igbo but are modulated by tonal constraints and culturally embedded communicative practices. This positions storytelling as an important source of prosodically structured input for children in oral societies.

Vincent Nwosu
(University of Calgary)

09:40 - 10:00

First Language Retention in Adult Korean Adoptees (online)

International adoptees (IAs) have been shown to retain aspects of their pre-adoptive languages into adulthood, consistent with the Permanence Hypothesis (Bowers et al., 2009), though substantial individual variability remains. This exploratory study examines whether age of adoption group and post-adoption exposure independently contribute to first-language (L1) retention in adult Korean adoptees. Fifty-six Korean adoptees residing in the United States completed a Korean phonetic discrimination task, Korean and English lexical production tasks, and a Korean morphosyntactic comprehension task, and their performance was compared to a control group of twenty-four L1 English users with no reported exposure to Korean. A two-way analysis of variance revealed significant main effects of both group and exposure on phonetic discrimination sensitivity (d′), with no significant interaction between the two factors. These results indicate that group differences and post-adoption exposure exert independent effects on L1 phonetic sensitivity, such that increased exposure is associated with improved discrimination across groups. The absence of an interaction suggests that re-exposure confers comparable benefits regardless of adoption group, consistent with the view that early-established phonological representations remain available for reactivation rather than undergoing wholesale reorganization. Overall, the findings support the Permanence Hypothesis by demonstrating that both early experience and later exposure contribute additively to adult L1 outcomes in international adoptees.

Cory J. Lemke (Ha Woon Do)
(University of Illinois Urbana-Champaign)

10:00 - 10:30

Early acquisition of basic sentence patterns in Korean

According to the developmental progression of children’s use of syntactic symbols (Tomasello 2003), around 12 months, children can use the holophrases as a single linguistic symbol. At 18 months of age, they can speak a more systematic pattern called pivot schemas, which are the patterns children have heard most often. At 2 years of age, Item-based constructions have syntactic marking, which is the syntactic device used for marking participant roles. Next, abstract constructions begin to be used at around 36 months.

The basic intransitive structure consists of a subject and a predicate(a verb or an adjective) in Korean. The transitive construction is a subject, an object, and a predicate, and the ditransitive construction adds a dative argument as an oblique to it. A subject(a nominative argument) and a locative argument make the existence construction. and it can be changed order, such as a locative/dative and a nominative, can represent possession. This construction and double nominative construction can express the emotion or psychological state of the speaker. These patterns emerge as pivot schemas and progress to item-based constructions with postpostions.

In CHILDES data, Korean infants at 1;3 begin to speak a noun and the copula ita as a first pivot schema. At 1:5, a noun and a noun as the possessor and the possessed is a pivot schema.

At 1;7 a noun and a descriptive predicate represent a basic intranstive pattern, for example, bus ppabang (bus horn-sound), and it developed to a locative noun and an existensive predicate, such as yeki issta (Here it is). That is related to the construction with a noun and a locative noun, such as emma yeki (mommy here). And then a noun with a locative marker ey (at) emerged in this period. The topic construction first emerges in a location context. such as nye-nun yeki (This is here).

Haegwon Jeong (Chosun University)

10:30 - 10:45

Coffee Break

10:45 - 11:15

Optimising Gaze Feature Selection for Evidence of Infant Word Recognition Using Random Forests (online)

Infant eye-tracking studies provide valuable insights into early language development but are challenged by noisy gaze data, attentional variability, and methodological choices in feature extraction. This study applies a Random Forest (RF) approach to optimise gaze feature selection for assessing infant word recognition in a Looking-While-Listening task. Using eye-tracking data from 25 Korean-learning infants (≈14 months), we trained RF models on raw and baseline-adjusted gaze features derived from post-stimulus time windows. Model performance was evaluated using leave-one-out cross-validation. Results showed that a small subset of 4–5 gaze features yielded optimal performance, with no additional gains from larger feature sets. The RF-based approach achieved substantially higher convergence with parental reports on the MacArthur–Bates Communicative Development Inventories (r = .65) compared to conventional rule-based metrics. These findings demonstrate that data-driven feature selection can improve the convergence validity and interpretability of gaze-based measures, offering a principled framework for refining behavioural assessments of early word recognition.

Jun Ho Chai
(Sunway University)

11:15 - 11:45

Evaluating and Adapting iCatcher+ with Korean Data: A Transfer Learning Approach

Jiho Lee
(POSTECH)

11:45 - 12:15

Investigating familiar word recognition in infants across ages and languages in a remote online study - The ManyBabies-AtHome: Looking-While-Listening project

Current and previous worldwide collaborations between developmental researchers have begun to implement remote studies to enable engagement from a wider population of researchers and participating families (Zaadnoordijk et al., 2021). However, while large-scale collaborations and remote data collection hold many promises, they are especially challenging when studying language (and by extension, familiar word recognition). Word recognition studies about infants learning different languages within the same study are rare (e.g. Ramon-Casas et al., 2009), and experimental design and analytic decisions vary considerably between studies (Von Holzen & Bergmann, 2021; Zettersten, et al., 2021), rendering comparison extremely difficult. Additionally, there also exists discrepancies between the measures used to assess word recognition in children (e.g., Houston-Price et al., 2007; Potter & Lew-Williams, 2023). To address these shortcomings, the ManyBabies-AtHome: Looking While Listening project aims to study the developmental trajectory of word recognition in infants learning different languages across different ages, using an online version of the traditional Looking-While-Listening paradigm (LWL; e.g., Golinkoff et al., 1987; Fernald, et al., 2008). Specifically, we aim to examine whether children’s word recognition as indexed by speed and accuracy of their looking toward a named referent varies as a function of (1) age of the child, (2) whether their caregiver indicated that they understand the word, and (3) the language they are learning. We will recruit infants between 10-24-months to participate in this study, from various countries and learning various languages, aiming for 48 participants across the age group for each language. In the online LWL task, infants will be shown two (presumably) familiar objects on the screen, and hear the label of one of the objects embedded in a carrier phrase. The experiment will be implemented via E-Babylab (Lo et al., 2024), with the integrated Webgazer application (Papoustaki et al., 2016) automatically estimating infants’ looking on the screen using the participant’s webcam. Additionally, caregivers will also report children’s knowledge of the presented words through a questionnaire. We predict that (1) infant word recognition accuracy and speed will increase as they grow older and (2) will be higher for words indicated by parental report as known to the child. However, (3) there may be variation in this relationship across languages due to cultural and lexical differences, though we cannot currently speculate on the direction or magnitude of this potential effect. Our approach has broad implications, as resource-friendly adaptations to local contexts remain a challenge for cross-linguistic and cross-cultural work, and additionally, the outcome of this project will establish the largest cross-linguistic, cross-cultural dataset on the development of infant word recognition.

Rajalakshmi Madhavan
(Chosun University)

12:15 - 13:30

Lunch Break

13:30 - 14:00

Measuring Korean Parents' Knowledge of Early Language Development: Validation of the Korean SPEAK Short Form

Caregivers' knowledge and expectations about early language and cognitive development may shape the quantity and quality of language input they provide to children. However, there are limited validated psychometric instruments available to measure Korean parents’ knowledge. This study examined the factor structure, reliability, and external validity of the Korean adapted SPEAK(Survey of Parent/Provider Expectations and Knowledge, Suskind et al., 2017) questionnaire between its scores, socioeconomic status indicators and children's vocabulary scores. Using a sample of 636 Korean parents and weighted least squares estimation, exploratory and confirmatory factor analyses yielded a two-factor, 11-item structure: (F1) Beliefs about Developmental Readiness (8 items) and (F2) Media-Based Language Learning (3 items). Reliability estimates using polychoric-based coefficients were acceptable (total scale: ω = .828, α = .827; F1: ω = .808, α = .805; F2: ω = .785, α = .782). Associations between variables were assessed in a subsample with CDI data (n = 360) using regression: parenting knowledge was a robust positive predictor of children’s vocabulary use, especially at older ages. SES indicators added little and did not moderate the association. Children’s vocabulary use was primarily driven by the F1 factor rather than F2. Therefore, the Korean SPEAK short form demonstrated an interpretable factor structure, acceptable reliability, and theoretically consistent associations with SES and children's vocabulary, supporting its utility for research on parental beliefs about early language development in Korean contexts.

Ji-Hye Suh
(Chosun University)

14:00 - 14:30

Interpreting emotional sequences in autistic children with multimodal temporal AI

Autistic children often express emotional distress through temporal patterns rather than isolated facial expressions. Caregivers may misinterpret these reactions by focusing on visible context instead of underlying causes. We present a multimodal AI framework that interprets emotional behavior as a time-dependent sequence using facial emotion probabilities and audio-based arousal cues, without relying on speech recognition. Observed behavior is aligned to short prototype trajectories representing caregiver-relevant latent functions such as repair integrity or shared control. An energy-based and calibrated inference scheme preserves uncertainty and supports cautious interpretation. The system generates parent-facing explanations that help translate observed behavior into actionable understanding. A real-world case study demonstrates that the method can correctly infer latent causes of distress even with minimal data and incomplete caregiver input.

Byungyun Jeon
(POSTECH)

14:30 - 14:45

Coffee Break

14:45 - 15:15

Not all touches are the same: Differential roles of caregiver touch in infant word learning

Infants acquire language by leveraging multimodal input rather than relying solely on auditory input. In fact, tactile cues from caregivers aid infants in language learning by marking boundaries in the speech stream and indicating word-referent relations. We explored the relevance of touch type in the communicative role of touch by analyzing its temporal alignment with speech, integration with sound-symbolic words, and co-occurrence with prosodic cues. Specifically, we compared Supportive Touch, which is sustained, frequently used, and mostly observed on the torso, with Attention-Getting Touch, which is brief, less frequent, and distributed across diverse body parts. Results indicate that Attention-Getting Touch, which aligns more often with words in general, overlaps more with sound-symbolic words, and accompanies speech that is prosodically enhanced. These results suggest that Attention-Getting Touch, being more perceptually salient, potentially serves a stronger communicative function and therefore effectively assists infant language development.

Joo Kyeong Kim
(Chosun University)

15:15 - 15:45

Phonetics and pragmatics of pitch accents in Southern British English

In English, H* is said to encode new information and be realized as high pitch, while L+H* encodes degrees of contrastivity and realized as rising pitch. However, empirical evidence for this distinction is sparse, especially in Southern British English (SBE). To gain a better understanding, we examined 2,126 words with high and rising accents in an SBE corpus of unscripted speech. The accents were separately annotated for (i) f0 shape (high or rising) in PRAAT and (ii) pragmatic function (corrective, contrastive, and non-contrastive for all other high and rising accents) based on written transcripts only. The data were modelled using Functional Principal Component Analysis and GAMMs. Phonetically H* and L+H* were distinct: H* was a fall and L+H* a rise-fall. However, these shapes did not map onto separate pragmatic functions: corrective accents were likely to be L+H*s and were, thus, distinct from non-contrastive accents, which were likely to be H*s, but there was no difference between all other pairwise comparisons, suggesting that the mapping between phonetic form and pragmatic function was not one-to-one. By separating the shape- from the meaning-based annotation procedures, the relation between the f0 shapes and the pragmatic functions of these accents is thus better understood.

Jiseung Kim
(Chosun University)

15:45 - 16:15

Processing Multimedia Data Using Artificial Intelligence

Seong Hyu Chon
(POSTECH)

16:15 - 16:30

Closing Remarks

Eon-Suk Ko
(Chosun University)

The above schedule may be changed depending on the situation.

Venue: Hotel Bridge Seogwipo

Contact information

Tel | 054-279-2734
Email | chosun.cds@gmail.com

Picture taken at the 2026 MINDS-CDS workshop in Jeju

Past events

Learning in Humans and Machines | 2024.12.12. (Thursday) - 12.13. (Friday) | Agenal Hall, 2F Lahan Hotel, Gyeongju
Music, Mathematics & Language - Through the lens of data | 2023.07.28 (Wednesday) ~ 29 (Thursday) l POSCO Center, Seoul

Past events Gallery

Picture taken at the 2024 MINDS-CDS workshop in Gyeongju

Picture taken at the 2023 MINDS-CDS workshop, Music, Mathematics & Language in Seoul

Page updated

Google Sites

Report abuse