Global Fellowship
The ultimate goal of this research is to deeply explore language development and interaction across the entire human lifespanāfrom infancy to old ageāand to contribute academically and socially by leveraging AI and data science. To this end, we aim to analyze the overarching trajectory of human language development and establish scientific foundations that can be applied to education, healthcare, and technological advancement.
To trace this developmental trajectory, we conduct foundational linguistic research, analyze social factors, study aging populations, and pursue AI application research. Through the integration of these research findings, we seek to generate evidence-based insights that can be applied across multiple domains, including education, public health, and technology.
This research is structured around four specific objectives:
1. Early Language Acquisition
2. Social Environment and Language
3. Aging and Communication
4. Language AI and Data Science
One of the main objectives of this research is to systematically investigate how infants and toddlers in Korean-speaking households acquire phonemes, segment words, and comprehend sentences. To achieve this, we combine experimental and naturalistic methodsāincluding eye-tracking, the Headturn Preference Procedure (HPP), BabyView, and LENAāto analyze language input and learning processes in real-life contexts. In particular, through eye-tracking studies with Korean-speaking children, we aim to uncover how shape bias is acquired and how linguistic structure influences this process.
In addition to examining the influence of environmental input on early language acquisition, this research explores how factors such as conversational initiation, turn-taking, and social status shape the structure of adult interactions. We investigate how caregiverāchild interaction types, reading and musical activities affect language development in both children and older adults, and place a particular emphasis on empirically analyzing how gender, power dynamics, and socioeconomic factors influence both caregiverāchild and adult communication patterns.
In addition to research on cognitive development in early childhood, this study investigates how the frequency and quality of social interactions affect cognitive health in older adults, using large-scale collections of natural speech data. In particular, by applying AI-based speech analysis techniques, we aim to examine how aging influences language perception, speech production, and fluency. Based on quantitatively measured linguistic difficulties in older adults, we seek to detect early signs of decline in vocabulary retrieval, syntactic complexity, and contextual comprehension abilities.
This study aims to develop research methodologies that automatically analyze language environments through AI-based speech analysis, eye-tracking, and sentiment analysis using large language models (LLMs). By utilizing long-term audio recordings, we seek to automatically transcribe and analyze interaction patterns between children and adults, thereby quantifying how the linguistic environments of multicultural Korean households influence childrenās language and cognitive development. To achieve this, the study integrates multimodal sensory cues, including visual, tactile, and gestural data.
This project aims to establish Korea as an international research hub by analyzing language development processes in agglutinative languagesāan area that has been underrepresented in Western-centric studies. Through empirical research on language acquisition and change in Korean contexts, we seek to enhance academic diversity and elevate Koreaās position within the global research landscape.
Expansion of International Research Networks and Leadership:
The institute has established close collaborations with leading institutions and scholars worldwide, including the ManyBabies-AtHome project (in collaboration with Katie Von Holzen) and Stanford University (in collaboration with Mike Frank). Through the BabyView and ManyBabies-AtHome projects, we compare and analyze infantsā lexical and cognitive development across different languages and aim to take a leading role in advancing new methodologies by hosting international conferences and workshops.
Advancing Korean-Centered Academic Diversity:
Moving beyond the Western-centered paradigm, the institute conducts comprehensive lifespan language research grounded in the Korean language. We develop machine learning models that reflect the grammatical characteristics of Korean as an agglutinative language and provide cross-linguistically comparable multilingual datasets that contribute to the globalization and diversification of language development research.
Global Development of the Next Generation of Scholars:
The institute is the only host institution in Asia to continuously participate in the International Infant Studies Summer Internship Program, offering international research opportunities and global networking experiences for both domestic and international students. Building on this foundation, the HK Global Fellowship program nurtures the next generation of scholars by supporting their academic growth and international research capabilities. Additionally, through the graduate program in Humanities Data Science , we train researchers who can integrate Korean and other humanities data with data science methodologies.
The institute is dedicated to conducting both quantitative and qualitative data-driven analyses to advance the study of human language development, supported by a state-of-the-art research infrastructure. This approach overcomes the limitations of prior studies that focused on specific age groups or linguistic phenomena, enabling the exploration of fundamental mechanisms underlying language development across the human lifespan.
Large-Scale Natural Speech Data Collection:
We collect ecologically valid natural speech data and related multimodal datasets from participants across all age groups, from infants to older adults. To achieve this, we actively utilize advanced wearable and sensing technologies, including the LENA (Language ENvironment Analysis) system, BabyView (first-person perspective video data), and webcam-based eye-tracking tools such as WebGazer and iCatcher+.
Automated Computational Methodologies:
We develop automated systems for transcription, speaker identification, and utterance segmentation to classify child- and adult-directed speech. To empirically assess the emotional factors influencing childrenās language and cognitive development, we propose LLM (Large Language Model)-based sentiment analysis methods and precisely measure the relationship between infantsā visual attention and language acquisition.
Enhancing Research Reliability and Accessibility:
By implementing reproducible experimental structures, analytical tools, and open-scienceābased data sharing, we automate precise data processing and analysis. This contributes to increasing the objectivity and credibility of research outcomes while improving accessibility for low-resource researchers.