David Temperley
(Eastman School of Music, University of Rochester)
Plenary Talk: Melodic expectation: The state of the field and some new results
Abstract: It has often been said that expectation plays a central role in our experience and enjoyment of music. Many studies have explored the factors affecting musical expectation (especially in melodies) and how it relates to enjoyment. In this talk I survey some important findings and present some new results from my own work. I begin by discussing approaches to the modeling of melodic expectation. Two prominent approaches are the rule-based (or “Gestalt-based”) approach and the statistical (n-gram) approach. I discuss pros and cons of the two approaches, with regard to both experimental support (including two recent studies of my own) and cognitive plausibility. I then turn to research on expectation and enjoyment. A wide range of views have been put forth on this issue; it has been suggested that people enjoy highly expected events, moderately expected events, or unexpected events. A widely discussed idea in the last few years is that enjoyment may be affected not only by expectedness of events but also by our level of uncertainty about them, usually quantified with entropy. I discuss three recent studies that explore this idea; I raise doubts about their methodology and also about the general idea of entropy as a factor in musical enjoyment.
Kyung Myun Lee
(KAIST)
Understanding Musical Experience for Human-Centered Technology
Abstract: As music technologies increasingly shape how we listen to, create, and share music, it is becoming essential to understand not only acoustic signals, but also the human experience of music itself. In this talk, I will introduce a series of studies from our laboratory that examine musical experience through cognitive, neural, and social perspectives, demonstrating how these findings can guide the future of human-centered music technologies. First, I will present EEG studies revealing how brain responses reflect individual differences in musical preference and the deep engagement of K-pop fandom. These findings suggest that physiological signals can capture the richness of how we experience music. I will then extend this perspective to more diverse contexts, including cochlear implant users and AI-generated music, exploring how technology can make music more emotionally meaningful and accessible to everyone. Finally, I will discuss our recent work on VR concert experiences, focusing on the role of haptic feedback and avatar audiences. Together, these diverse studies illustrate how neural and behavioral evidence can inspire the next generation of music technology. I will conclude by discussing why future systems should be designed not only for technical performance, but also to foster meaningful, inclusive, and socially connected musical experiences.
Pierre Labendzki
(University of East London)
Temporal patterns in the complexity of child-directed song lyrics reflect their functions
Abstract: Content produced for young audiences is structured to present opportunities for learning and social interactions. This research examines multi-scale temporal changes in predictability in Child-directed songs. We developed a technique based on Kolmogorov complexity to quantify the rate of change of textual information content over time. This method was applied to a corpus of 922 English, Spanish, and French publicly available child and adult-directed texts. Child-directed song lyrics (CDSongs) showed overall lower complexity compared to Adult-directed songs (ADsongs), and lower complexity was associated with a higher number of YouTube views. CDSongs showed a relatively higher information rate at the beginning and end compared to ADSongs. CDSongs and ADSongs showed a non-uniform information rate, but these periodic oscillatory patterns were more predictable in CDSongs compared to ADSongs. These findings suggest that the optimal balance between predictability and expressivity in information content differs between child- and adult-directed content, but also changes over timescales to potentially support multiple children’s needs.
May Pik Yu Chan
(University of Pennsylvania)
Pitch-Dependent Vowel Modification in Singing
Abstract: Singing across a wide pitch range often involves adjustments to the vowel space in order to maintain consistent timbre. Existing studies on classical singers have examined how resonance profiles change across pitch range, raising further questions about the articulatory adjustments associated with these acoustic patterns. In particular, it remains unclear whether vowel modification in singing primarily reflects learned vocal technique or more general articulatory constraints associated with high-pitch voice production. This study examines pitch-related changes in tongue position in professional and lay singers using ultrasound tongue imaging. Results show that professional opera singers engage in pitch-dependent tongue-position adjustments that result in a gradual reduction of vowel contrast at higher pitches, with differences in magnitude across voice types. Lay singers with smaller pitch ranges display relatively uniform tongue positions across pitches, while lay singers with wider pitch ranges also exhibit tongue-position adjustments in higher-pitched singing. These findings suggest that vocal tract adjustments may not be solely learned strategies, but may also emerge from general articulatory and acoustic pressures associated with voice production at higher pitches.
Samuel Mehr
(University of Auckland & Yale University)
Core systems of music perception
Abstract: Like vocalizations found across the animal kingdom, music serves a basic communicative function in our species: it transmits information from the minds of people producing it to the minds of people hearing it. In this talk I will present evidence that this communicative function is a core property of human cognition (review: Mehr 2025, Trends in Cognitive Sciences). First, I will show that the communicative properties of music are widespread, as it produces reliable psychological responses in listeners across cultures and across the lifespan. These findings suggest that human musicality is not an incidental outcome of cultural evolution, but instead forms a constituent part of our psychology, in a fashion comparable to other domains, like social cognition, number, and so on. Then, I will present ideas on the perceptual mechanisms that underlie music's communicative functions, focusing on the automatic hierarchical representation of pitch and rhythm, including new data from infant perception studies as well as citizen-science approaches developed in my lab. High-level music processing abilities appear to be universal, early-developing, and uniquely human, and can be considered specialized components of human audition.
Yune Sang Lee
(Seoul National University)
Shared Circuits: How Music and Language Meet in the Brain
Abstract: Human communication rides on timing, prediction, and rhythm—key principles that music and language share. In this talk, I synthesize findings from my research programs showing that these domains rely on overlapping neural circuits spanning the auditory cortex, the dorsal auditory–motor stream, basal ganglia, and cerebellum. First, using behavioral paradigms, I demonstrate that musical rhythm processing predicts individual differences in syntactic operations in language. Second, I present intervention studies in which rhythmic training—designed to strengthen auditory–motor coupling—supports language outcomes in populations with timing-related vulnerabilities (e.g., aphasia). Third, I describe mechanistic EEG work using 18 Hz (beta) auditory stimulation using binaural beats to probe—and potentially enhance—language processing. Across these projects, a common theme emerges: temporal prediction and auditory–motor synchronization act as core neural mechanisms linking musical rhythm to language processing and rehabilitation. I close by outlining translational design principles—ecologically valid tasks and scalable digital delivery,—that can move music-informed language interventions from lab to clinic and classroom. Together, the evidence argues that when the brain keeps the beat, it becomes better at analyzing language structure—offering a principled path from shared circuits to practical therapies.
Alan Langus
(University of Potsdam)
Do Speech and Music Share Rhythm? Unifying the Rhythms of Language and Music Through Mathematical Models
Abstract: Rhythm is often viewed as a key link between language and music. Many influential theories of language perception, acquisition, and neural processing have been motivated by the idea that speech, like music, is organized by recurring temporal patterns. This view has inspired attempts to understand language, music, and even animal vocalizations within a common rhythmic framework. Yet an important paradox remains. While speech can exhibit rhythmic regularity comparable to that of music when we sing, chant, or recite poetry, everyday conversation is highly variable and lacks the stable periodicity found in many musical rhythms. This makes rhythm difficult to compare across languages and musical styles. Here, we present a novel method for quantifying spontaneous sensorimotor synchronization to speech and music through listeners' pupil dynamics. We combine this approach with mathematical models of temporal variability to investigate whether rhythmic structure can emerge from temporal variability itself. To do so, we analyzed more than half a million utterances drawn from over 100 languages. Our findings show that statistical regularities in the timing of syllables and prominence can explain speech rhythm through the same neural mechanisms that give rise to the perception of rhythm in music. More broadly, they show how rhythm emerges from the temporally variable speech signal and how mathematical models of variability can reconcile speech and musical rhythms within a common theoretical framework.
Riccardo Muolo
(iTHEMS, RIKEN)
Time delay embeddings to characterize the timbre of musical instruments using Topological Data Analysis
Abstract: Timbre allows us to distinguish between sounds even when they share the same pitch and loudness, playing an important role in music, instrument recognition, and speech. Traditional approaches, such as frequency analysis or machine learning, often overlook subtle characteristics of sound. Topological Data Analysis (TDA) can capture complex patterns, but its application to timbre has been limited, partly because it is unclear how to represent sound effectively for TDA. In this study, we investigate how different time delay embeddings affect TDA results. Using both synthetic and real audio signals, we identify time delays that enhance the detection of harmonic structures. Our findings show that specific delays, related to fractions of the fundamental period, allow TDA to reveal key harmonic features and distinguish between integer and non-integer harmonics. The method is effective for synthetic and real musical instrument sounds and opens the way for future works, which could extend it to more complex sounds using higher-dimensional embeddings and additional persistence statistics.
Jae-Hun Jung
(POSTECH)
Topological Optimization for Korean Music Generation and Entropy-Based Analysis
Abstract: Deep learning–based approaches to music generation typically require large-scale datasets for effective training, as learning the underlying patterns and implicit grammar embedded in musical data. However, in many cases, particularly for certain genres of Korean traditional music, the available datasets are limited and insufficient for standard data-driven approaches. To address this issue, we proposed a framework based on structural learning via topological data analysis (TDA). In our work, we extract topological features that capture the intrinsic structure of musical sequences and feed these features into the learning process. This talk will present how such topological representations can be embedded into machine learning models for music generation. While stability threorem ensures robustness of topological features under perturbations, they do not directly guarantee that minimizing losses of these features leads to isometry between the original and generated music. But, under certain conditions, topological constraints can enhance structural similarity between them. Furthermore, we use entropy-based analysis to quantify how well the generated music preserves the structural characteristics of the original compositions. We will illustrate these ideas through examples from Korean traditional music.
Hee-sun Kim
(Kookmin University )
Plenary Talk: Between Pattern and Expression: Gugak at the Intersection of Mathematics and Language
Abstract: In the context of this conference on mathematics, music, and language, this presentation aims to reconsider how music is constituted in relation to mathematical structure and linguistic expression. Music has been conceptualized, on the one hand, as a system of patterns, proportions, and regularities, and, on the other hand, as a medium of expressive communication that conveys meaning and affect. Mathematical approaches have emphasized the structural dimensions of music—such as repetition, ratio, and formal organization—whereas linguistic approaches have focused on processes of meaning-making and expression. What is at stake, however, is not the distinction itself, but how musical pattern and expression interact and are flexibly formed within performance. Furthermore, gugak is characterized by modes of transmission and performance that extend beyond standardized systems of notation. This suggests that certain aspects of musical practice cannot be fully accounted for solely through fixed mathematical patterns or formalized linguistic models, thereby requiring a more flexible analytical approach.
Dasaem Jeong
(Sogang University)
Symbolic Structure in Music AI: From Multimodal Translation to Modeling Gugak
Abstract: This talk presents recent studies on symbolic structure in music AI across a broad spectrum of musical representations. I begin with U-MuST, a unified multimodal translation framework that connects score images, symbolic music, MIDI, and performance audio through a shared token-based architecture, made possible by large-scale score video data collected from YouTube. I then introduce gugak-oriented research in two directions: regenerating Korean court music ensembles through direct Jeongganbo encoding, and learning discrete symbolic tokens from continuous pitch contours with autoencoders for new forms of musical analysis. The Jeongganbo work demonstrates how culturally specific notation can support the generative modeling of 15th-century court repertory. In contrast, the autoencoder-based learning of discrete tokens moves beyond pre-existing notation systems such as staff notation and explores how musically meaningful symbolic representations can be derived directly from data with minimal prior assumptions. Overall, the talk explores both how symbolic information supports music AI and how music AI can in turn produce new symbolic forms for representing musical knowledge.
Myung Ock Kim
(Konkuk University)
Human-AI Collaborative Composition in Sanjo: Exploring New Creative Possibilities for Traditional Music
Abstract: Sanjo is a prominent genre of Korean traditional music, recognized for its technical complexity as a solo instrumental form rooted in oral tradition. Sanjo is regarded as one of the most challenging genres to both perform and compose. This study investigates the potential of Human-AI collaboration in the compositional process of Sanjo, moving beyond machine generation to a co-creative framework. This study implements a collaborative workflow where an algorithm suggests melodic variations based on the topological analysis of existing melodies, while the human composer refines the 'Sigimsae' and constructs the structural narrative using artistic imagination. The research focuses on the interplay between algorithmic suggestions and human artistic intuition. The presentation will further discuss the challenges and possibilities of this co-creative approach, providing insights into the sustainable evolution of traditional arts in the digital age.
Chaeyoung Lee
(Korea Institute, Harvard University)
Gugak Instruments in Digital and Electronic Worlds: Sampling, Electronic Music, and Technological Mediation
Abstract: This paper examines the technological mediation of gugak instruments through two related developments: the digitization of their sounds into samples and virtual instruments, and their incorporation into electronic music practices. At a moment when artificial intelligence increasingly dominates discussions of music technology, this paper turns instead to pre-AI research initiatives, institutional projects, and collaborative musical practices that brought gugak into dialogue with digital media, electronic sound, and computational environments. First, drawing on my prior research collaboration at the Center for Arts and Technologies at Seoul National University (CATSNU)—where I participated in the development of gugak mobile applications and virtual studio technology instruments (VSTi)—I analyze how the sounds, timbres, rhythms, and performance techniques of gugak instruments were recorded, sampled, systematized, and reconfigured as digital materials. This process presents distinctive challenges: flexible rhythm and melodic contour, microtonal inflections, timbral nuance, improvisatory ornamentation, and performer-specific interpretation often resist sampling, standardization, and quantification. I then discuss collaborative practices that bring gugak instruments into electronic sound environments, including amplification, sampling platforms, Max/MSP patches, and live-processing systems. In these contexts, gugak instruments are not only preserved or reproduced through technology but are also placed in new relationships with electronic sound, experimental composition, and collaborative performance. By placing these two developments side by side, this paper asks how technological mediation changes what gugak instruments are understood to be: cultural heritage objects, sources of sonic data, compositional materials, or collaborators in experimental sound practice. Grounded in ethnomusicological participant observation, interviews, archival research, and media analysis, I argue that gugak’s digital and electronic transformations are not simply technical processes of sound reproduction or preservation. Rather, they reshape ideas of tradition, creativity, authorship, and collaboration, revealing how technological mediation participates in broader processes of modernity, globalization, circulation, and musical experimentation.
Youngsun Kim
(Seoul National University)
The Architecture of Sanjo Listening: Reverberation and Relational Listening in Hanok, Concert Halls, and Sound Reinforcement Environments
Abstract: Sanjo is shaped through close interaction among performer, drummer, and audience. The rhythmic patterns of janggu or buk, the vocal interjections of the gosu, and the subtle timbral changes of solo instruments such as gayageum and geomungo do not function independently; together they organize listening within acoustic space. In traditional Hanok settings, dense early reflections help maintain the clarity of rhythmic patterns, chuimsae, and instrumental timbre. The short distance between performer and listener, wooden structures, paper doors, earthen surfaces, low ceilings, and small room volumes contribute to the tension and immediacy characteristic of Sanjo performance. Today, however, Korean traditional music is increasingly performed in concert halls designed for Western classical music and in sound reinforcement environments. Extended reverberation, larger spatial volumes, and amplification systems change the listening structure of Sanjo. Subtle rhythmic tension, the linguistic rhythm of chuimsae, and the directness of solo instrumental timbre are experienced differently than in traditional Hanok settings. This paper argues that reverberation in Sanjo is not merely a source of acoustic richness, but a decisive factor that reshapes relational tension among performer, gosu, and audience. It further suggests that listening in Sanjo may be understood less as immersive envelopment and more as a form of relational listening shaped by proximity, breathing, temporal response, and vocal interaction. The paper interprets this listening structure as an “Architecture of Listening” in which performer, gosu, audience, and acoustic space remain closely interconnected. By comparing Hanok, concert halls, and sound reinforcement environments, this study considers how changing acoustic conditions influence listening structures and musical perception in Korean traditional music.
Vincent Nwosu
(University of Calgary)
When Speech Meets Rhythm: What Igbo Children’s Songs Reveal About Language Structure
Abstract: In Igbo, adjacent vowels across word boundaries often coalesce in speech, sometimes appearing acoustically as a single vowel. Yet when these same sequences are sung in children’s songs, speakers consistently align them with more rhythmic space than a singleton vowel. This talk examines how musical text-setting reveals aspects of prosodic structure that are not always visible in speech alone. Using a well-known Igbo traditional children’s song, I analyze how compound names containing vowel sequences are mapped onto rhythmic positions in the melody. Listener judgments and patterns of musical alignment show that speakers prefer mappings that treat coalesced vowel sequences as occupying greater rhythmic weight than single vowels, even when vowel duration differences are minimal. The results suggest that musical rhythm reflects speakers’ underlying representations of timing structure and syllable weight. More broadly, the study demonstrates how musical practices can provide independent evidence for phonological structure in tonal languages.
Jonny Jungyun Kim
(Pusan National University)
When melody constrains phonology: Cognitive flexibility in Korean stop contrast production
Abstract: This study examines how F0-related sound change in Korean is dynamically reconfigured across speech and song. In younger speakers, the traditional phonetic contrast based on voice onset time (VOT) between aspirated stops /ph, th, kh/ and lenis stops /p, t, k/ has been neutralized and instead shifted toward an F0-based cue-weighting system in phrase-initial positions. This innovative pattern is also socially meaningful, indexing youth-oriented urban identity. However, singing imposes a unique constraint: melodic pitch largely fixes F0 independently of segmental contrast. To investigate adaptation to this constraint, Korean speakers in their 20s produced read and sung sentences containing aspirated–lenis stops distributed across prosodic positions systematically aligned with musical structure. The results show a striking reversal of cue weighting. In read speech, younger speakers exhibited the expected VOT merger. In singing, however, they expanded VOT differences, effectively adopting an older cue structure to preserve phonemic contrast. These findings suggest that multiple phonetic grammars coexist within individual speakers and can be selectively recruited depending on contextual constraints. Specifically, speech production is not tied to a single stable representation of contrast, but emerges as a flexible adaptive system integrating linguistic structure, prosodic constraints, musical pitch organization, and socially indexed phonetic knowledge.
Eon-Suk Ko
(Chosun University)
Edge-Prominence in Korean Song across Time and Genres
Abstract: This study investigates whether the alignment between lyrics and melody in Korean music reflects systematic phonological mapping. Our previous analysis of children’s songs showed that strong metrical beats align predominantly with word boundaries (initial and final syllables), suggesting a demarcative function of musical prominence in a non-stress language. The present project extends this scope to a diachronic corpus of Korean songs spanning multiple historical periods. We focus on whether the historical loss of vowel length and subsequent changes in word-level prosody have shifted alignment principles over time, or if patterns vary primarily by genre. To quantify the strength and distribution of these alignment patterns, we incorporate information-theoretic measures. By exploring the interaction between evolving linguistic structures and musical meter, we seek to determine the stability of edge-prominence across different eras and styles of Korean song.
David Temperley
(Eastman School of Music, University of Rochester)
Plenary Talk: Anticipatory Syncopation in Popular Music - closing talk
Abstract: It is well-known that popular music in the 20th century (and the 21st) features a much higher degree of syncopation (conflict between accents and meter) than music of previous eras. It is less well-known that syncopation in popular music is subject to strong constraints. In the vast majority of cases, syncopations are _anticipatory_: They are understood as belonging on the following strong beat. This is most evident in vocal music: treating weak-beat stressed syllables as anticipatory allows the usual alignment of meter and stress to be preserved. In this talk I will present a theoretical framework for the study of anticipatory syncopation, examine some of the specific forms it takes, and trace its evolution over the 20th century. While my main focus is on English-language popular music, I will also consider the role of anticipatory syncopation in Korean popular music.