Talks 

09:00 - 18:00

Overview

09:00 - 09:45 Registration

09:45 - 10:00 Opening and Welcome Remarks

10:00 - 11:00 Keynote: Roger Dannenberg (Remote, Chair: Juhan Nam)

11:00 - 12:20 Session #1: AI Accompaniment (Chair: Dasaem Jeong)

12:20 - 13:30 Lunch Break

13:30 - 15:30 Session #2: AI Performance Generation (Chair: Akira Maezawa)

15:30 - 16:00 Coffee Break

16:00 - 17:20 Session #3: Human Performance Understanding (Chair: Carlos Eduardo Cancino-Chacón)

17:20 - 18:00 Panel discussion (Moderator: Juhan Nam)

18:00 - 19:30 Dinner Break

Keynote Speaker

Roger Dannenberg

Emeritus Professor of Computer Science, Art & Music at Carnegie Mellon University 

"Computer Accompaniment and Beyond"

Abstract

From its beginnings around 40 years ago, Computer Accompaniment has inspired research in score following, music similarity, music alignment, collaborative performance and other topics. I will describe my early work in computer accompaniment systems and some of its commercial and research offshoots. I will introduce Accomplice, a new system created especially for experimental music composers that want to coordinate electronics with live keyboard performances. I will also describe some of my personal efforts to incorporate AI techniques into interactive computer music compositions. Finally, I will speculate on some of the new opportunities for live performance enabled by new machine learning techniques and systems.

Biography

Roger B. Dannenberg is Emeritus Professor of Computer Science, Art & Music at Carnegie Mellon University where he received a Ph.D. in Computer Science in 1982. He is internationally known for his research in the field of computer music. He is the co-creator of Audacity, an audio editor that has been downloaded 100's of millions of times, and his patents for Computer Accompaniment were the basis for the SmartMusic system used by hundreds of thousands of music students. His current work includes live music performance with artificial computer musicians, automatic music composition, interactive media and high-level languages for sound synthesis. Prof. Dannenberg is also a trumpet player and composer. He has performed in concert halls ranging from the historic Apollo Theater in Harlem to the modern Espace de Projection at IRCAM in Paris. Besides numerous compositions for musicians and interactive electronics, Dannenberg co-composed the opera La Mare dels Peixos with Jorge Sastre, and translated and produced the opera in English as The Mother of Fishes, in Pittsburgh in 2020.

Speakers

Carlos Eduardo Cancino-Chacón

Assistant Professor at the Institute of Computational Perception at Johannes Kepler University Linz (JKU), Austria

"Towards Expressive Artificial Musical Co-performers: The ACCompanion Story"

Abstract

With advances in AI and machine learning, there is a rising interest in developing interactive interfaces that allow people (both musicians and non-musicians) to explore music performance. The ACCompanion is an expressive automatic accompaniment system which enables collaborative music performance between humans and computers.  This system is capable of creating a human-like rendition of the accompaniment part of a piece of music in real-time, adapting to the soloist's tempo, dynamics, and articulation choices. This talk will discuss the components of the ACCompanion, including real-time score following, and expressive performance generation.  We will also explore the concept of togetherness in musical human-computer interaction, highlighting the challenges and potential of using artificial co-performers, which could serve as a platform in which to test hypotheses about the process of togetherness, as well as being a useful tool for music education and promoting the engagement of non-musicians with music.

Biography

Carlos Cancino-Chacón is an Assistant Professor at the Institute of Computational Perception at Johannes Kepler University Linz (JKU), Austria. Before this role, he conducted research at the Austrian Research Institute for Artificial Intelligence and was a Guest Researcher at RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion at the University of Oslo, Norway. His work revolves around developing machine learning models that understand the way humans perform and listen to music, with an emphasis on three topics: computational modeling of expressive performance, (real-time) human-computer interaction in musical contexts and cognitively plausible machine listening. He holds a PhD in Computer Science from JKU, an MSc in Electrical and Audio Engineering from the Graz University of Technology, a Bachelor's degree in Physics from the National Autonomous University of Mexico, and a Bachelor's degree in Piano Performance from the National Conservatory of Music of Mexico.

Akira Maezawa

Researcher at Yamaha Corporation

"Design of AI Music Ensemble System for Reaching People of Various Skills"

Abstract

Being able to play a musical instrument in a music ensemble is a rewarding experience for musicians, but is difficult for beginning musicians to participate in. To allow anyone to enjoy the experience of playing in a music ensemble, we have been developing interactive systems that will interactively play with musicians of different skill levels. In this talk I will introduce our work on music ensemble systems, and how it has been applied to installations and systems supporting beginning to advanced musicians. I will briefly discuss the underlying techniques for multimodal interaction, musically inspired coordination and generation of musical performance data. I will discuss how different design choices are employed to address different target users, and concluding by providing an industry perspective on possible directions for future research in the research community.

Biography

Akira is a researcher in music informatics, and has worked on research and development of music information retrieval and interaction technologies using machine learning, which is now used in various music products like music apps and electronic keyboard instruments. He has also developed numerous real-time music interaction systems, which have made appearances in concert venues like Ars Electronica Festival's "AI x Music" concert, SXSW, CES, Digital Content Expo, Tokyo Metropolitan Symphony Orchestra's "SALAD" Music Festival, and many others. He has received the Cannes Lions International Festival of Creativity's Entertainment Lions for Music (SILVER), the Information Processing Society of Japan’s Research and Engineering Award, and the Yamashita Research Award, and others.

Dasaem Jeong

Assistant Professor of the Department of Art & Technology at Sogang University, South Korea

"The Journey of VirtuosoNet, a DNN-based AI pianist"

Abstract

Expressive performance modeling addresses multiple complex challenges, such as interpreting the music score, capturing shared traits among human performers, and accommodating diverse interpretive possibilities. This talk introduces how a deep-learning-based expressive performance modeling system, VirtuosoNet, was proposed and implemented, covering its feature encoding scheme and dataset composition, neural network architectures, and the training procedure. The presentation aims to equip attendees with insights into how expressive performance modeling can be conceived as a deep learning task and the considerations it involves.

Biography

Dasaem Jeong, Assistant Professor in the Department of Art & Technology at Sogang University, South Korea. He obtained his Ph.D. in culture technology from KAIST under the supervision of Juhan Nam. His research focuses on various music information retrieval tasks, including expressive performance modeling and symbolic music generation.

Silvan David Peter

Ph.D candidate at the Institute of Computational Perception, Johannes Kepler University, Linz, Austria

"Uneasy Questions for Quantitative Evaluation of AI Models of Music Performance"

Abstract

Generative models of expressive piano performance (GMEPP) are usually assessed by comparing their predictions to a reference human performance. A generative algorithm is taken to be better than competing ones if it produces performances that are closer to a human reference performance. However, human performers can (and do) interpret music in different ways, making for a variety of possible references, and quantitative closeness is not necessarily aligned with perceptual similarity, calling the whole process into question. In this talk, I address several issues of quantitative evaluation of GMEPP stemming from these observations and try to sketch avenues for improvement.

Biography

Silvan David Peter is a University Assistant and PhD candidate at the Institute of Computational Perception, Johannes Kepler University, Linz, Austria. His research interests are the evaluation of and interaction with computational models of musical skills. He holds an M.Sc. degree in Mathematics from the Humboldt University of Berlin.

Li Su

Assistant Research Fellow at Institute of Information Science, Academia Sinica, Taiwan

Yu-Fen Huang

Post-doctoral Research Fellow at Institute of Information Science, Academia Sinica, Taiwan

"Understanding and Generation of Audio-Visual Multimedia Content with Deep Learning"

Abstract

Current generative AI technology still faces significant technical hurdles in generating audiovisual content conditioned on themes, music, and storytelling. Thus, the current animation industry still relies heavily on manual labor and expensive techniques like human motion capture (MOCAP) in content production. In recent years, the rise of virtual musicians and VTubers in the Metaverse has increased the demand for automatic content generation in movies and character animation. Motivated from this, we would like to know whether current generative AI technology can solve a technical hurdle: generating animation of virtual musicians that fits the music reasonably. In this talk, we will start from one of our research paper, “A Human-Computer Duet System for Music Performance” (Best Paper Award Candidate in the ACM Multimedia Conference 2020), for a system view, and introduce facial expression generation, fingering generation, body movement generation, visual storytelling, and other related research topics on music performance and AI.

Biography

Li Su is a Ph.D. from the Graduated Institute of Communication Engineering at National Taiwan University, Taiwan. Since 2017, he has been serving as an Assistant Research Fellow at the Institute of Information Science, Academia Sinica, Taiwan. In 2021, he was promoted to Associate Research Fellow. His research interests span across artificial intelligence, multimedia technology, and musicology. He has been recognized with the Best Paper Award at the International Society for Music Information Retrieval (ISMIR) conference and nomination for the Best Paper Award at the ACM International Conference on Multimedia (ACM MM). He was invited as a panelist for the 29th Golden Melody Awards International Music Festival Forum. His performance collaborations include "Rising of the New Sound" (at the National Concert Hall, 2017) and "Whispers of the Night" (at the Kaohsiung Wei-Wu-Ying Concert Hall, 2019).

Yu-Fen Huang is a post-doctoral research fellow at Music and Culture Technology Laboratory, Institute of Information Science, Academia Sinica, Taiwan. Her research applies Music Information Retrieval (MIR) and AI techniques to explore the expressive elements in musical audio and body movement. Her research topics include: 1) the cross-modal mapping between musical sound and body movement, 2) audio analysis for piano and string performances using AI models, and 3) the expressive semantics in musical body movements. She endeavors to collaborate and integrate methodologies in diverse disciplines including systematic musicology, music technology, music psychology, biomechanics, and 3-D motion capture technology.

Laura Bishop

Researcher of the RITMO Centre at the University of Oslo, Norway

"Coordination, Communication, and Togetherness in Classical Ensemble Playing"

Abstract

Ensemble musicians create shared rhythms through their sound and body motion. Strong rhythmic alignment can support a sense of social connection or "musical togetherness" between co-performers. In this talk, I will present a set of empirical studies that investigated body coordination and audio-visual communication between players in small classical ensembles (duos and quartets). These studies used a combination of methods, including motion capture, eye-tracking, pupillometry, and audio analysis, to test how musicians perform in conditions that are designed to encourage or discourage feelings of togetherness. Drawing on the results of these studies, I will discuss musical togetherness from a theoretical perspective, and propose a set of criteria that underlie togetherness experiences.

Biography

Laura Bishop is a Researcher at the RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion and the Department of Musicology at the University of Oslo, Norway. She completed her PhD in music psychology at the MARCS Institute, Western Sydney University, Australia, in 2013. Thereafter, she worked at the Austrian Research Institute for Artificial Intelligence (OFAI) in Vienna, Austria, first as a postdoc, then as PI of Austrian Science Fund projects on coordination and creativity in music ensemble playing. Her research investigates the social and cognitive processes involved in musical interaction using methods like motion capture, eye-tracking, and physiological measures. 

Juhan Nam

Associate Professor of Graduate School of Science & Technology of KAIST, South Korea

"Human-AI Music Ensembles on Stage and Lessons Learned"

Abstract

Human-AI music ensemble involves various analysis and generation capabilities for communication, synchronization, and expression in music performance. The KAIST AI Music Performance Team has been working on computational methods to enable the musical capabilities such as score following, cue detection, automatic music transcription, and expressive performance rendering, as well as developing systems to execute the algorithms in concert settings. In this talk, we will present several case studies of concert stage performances in which the KAIST AI music performance team has participated, and discuss the lessons learned.


Biography

Juhan Nam is an Associate Professor in the Graduate School of Culture Technology at the Korea Advanced Institute of Science and Technology (KAIST) in South Korea. He is leading the Music and Audio Computing Group at KAIST, working on various topics at the intersection of music, AI, digital signal processing, and human-computer interaction. In particular, his AI music performance team has developed an "AI pianist" system that can adapt to human performance and play musical pieces expressively. The AI Pianist system has been demonstrated in several concerts, collaborating with renowned artists such as Sumi Jo, Jasmin Choi, and Jonghwa Park.