Global Fellowship

Agenda

Language, Cognition, and Society: A Data-Driven Lifespan Perspective

The ultimate goal of this research is to deeply explore language development and interaction across the entire human lifespan—from infancy to old age—and to contribute academically and socially by leveraging AI and data science. To this end, we aim to analyze the overarching trajectory of human language development and establish scientific foundations that can be applied to education, healthcare, and technological advancement.

To trace this developmental trajectory, we conduct foundational linguistic research, analyze social factors, study aging populations, and pursue AI application research. Through the integration of these research findings, we seek to generate evidence-based insights that can be applied across multiple domains, including education, public health, and technology.

This research is structured around four specific objectives:

1. Early Language Acquisition
2. Social Environment and Language
3. Aging and Communication
4. Language AI and Data Science

Research on Language Acquisition and Environment in Infants/Young Children

One of the main objectives of this research is to systematically investigate how infants and toddlers in Korean-speaking households acquire phonemes, segment words, and comprehend sentences. To achieve this, we combine experimental and naturalistic methods—including eye-tracking, the Headturn Preference Procedure (HPP), BabyView, and LENA—to analyze language input and learning processes in real-life contexts. In particular, through eye-tracking studies with Korean-speaking children, we aim to uncover how shape bias is acquired and how linguistic structure influences this process.

Analyzing the Impact of Social and Environmental Factors on Language Use

In addition to examining the influence of environmental input on early language acquisition, this research explores how factors such as conversational initiation, turn-taking, and social status shape the structure of adult interactions. We investigate how caregiver–child interaction types, reading and musical activities affect language development in both children and older adults, and place a particular emphasis on empirically analyzing how gender, power dynamics, and socioeconomic factors influence both caregiver–child and adult communication patterns.

Research on Language and Cognitive Health in Older Adults

In addition to research on cognitive development in early childhood, this study investigates how the frequency and quality of social interactions affect cognitive health in older adults, using large-scale collections of natural speech data. In particular, by applying AI-based speech analysis techniques, we aim to examine how aging influences language perception, speech production, and fluency. Based on quantitatively measured linguistic difficulties in older adults, we seek to detect early signs of decline in vocabulary retrieval, syntactic complexity, and contextual comprehension abilities.

Developing Language Research Methodologies Utilizing AI and Data Science

This study aims to develop research methodologies that automatically analyze language environments through AI-based speech analysis, eye-tracking, and sentiment analysis using large language models (LLMs). By utilizing long-term audio recordings, we seek to automatically transcribe and analyze interaction patterns between children and adults, thereby quantifying how the linguistic environments of multicultural Korean households influence children’s language and cognitive development. To achieve this, the study integrates multimodal sensory cues, including visual, tactile, and gestural data.

Enhancing Korea's Global Standing through Worldwide Korean Language Research

This project aims to establish Korea as an international research hub by analyzing language development processes in agglutinative languages—an area that has been underrepresented in Western-centric studies. Through empirical research on language acquisition and change in Korean contexts, we seek to enhance academic diversity and elevate Korea’s position within the global research landscape.

Expansion of International Research Networks and Leadership:
The institute has established close collaborations with leading institutions and scholars worldwide, including the ManyBabies-AtHome project (in collaboration with Katie Von Holzen) and Stanford University (in collaboration with Mike Frank). Through the BabyView and ManyBabies-AtHome projects, we compare and analyze infants’ lexical and cognitive development across different languages and aim to take a leading role in advancing new methodologies by hosting international conferences and workshops.
Advancing Korean-Centered Academic Diversity:
Moving beyond the Western-centered paradigm, the institute conducts comprehensive lifespan language research grounded in the Korean language. We develop machine learning models that reflect the grammatical characteristics of Korean as an agglutinative language and provide cross-linguistically comparable multilingual datasets that contribute to the globalization and diversification of language development research.
Global Development of the Next Generation of Scholars:
The institute is the only host institution in Asia to continuously participate in the International Infant Studies Summer Internship Program, offering international research opportunities and global networking experiences for both domestic and international students. Building on this foundation, the HK Global Fellowship program nurtures the next generation of scholars by supporting their academic growth and international research capabilities. Additionally, through the graduate program in Humanities Data Science , we train researchers who can integrate Korean and other humanities data with data science methodologies.

Pioneering Data-Driven Research in the Humanities

The institute is dedicated to conducting both quantitative and qualitative data-driven analyses to advance the study of human language development, supported by a state-of-the-art research infrastructure. This approach overcomes the limitations of prior studies that focused on specific age groups or linguistic phenomena, enabling the exploration of fundamental mechanisms underlying language development across the human lifespan.

Large-Scale Natural Speech Data Collection:
We collect ecologically valid natural speech data and related multimodal datasets from participants across all age groups, from infants to older adults. To achieve this, we actively utilize advanced wearable and sensing technologies, including the LENA (Language ENvironment Analysis) system, BabyView (first-person perspective video data), and webcam-based eye-tracking tools such as WebGazer and iCatcher+.
Automated Computational Methodologies:
We develop automated systems for transcription, speaker identification, and utterance segmentation to classify child- and adult-directed speech. To empirically assess the emotional factors influencing children’s language and cognitive development, we propose LLM (Large Language Model)-based sentiment analysis methods and precisely measure the relationship between infants’ visual attention and language acquisition.
Enhancing Research Reliability and Accessibility:
By implementing reproducible experimental structures, analytical tools, and open-science–based data sharing, we automate precise data processing and analysis. This contributes to increasing the objectivity and credibility of research outcomes while improving accessibility for low-resource researchers.

Page updated

Google Sites

Report abuse