The banner image was generated via one of powerful diffusion models, SDXL, with prompt: Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing.
This tutorial will cover the theory and practice of diffusion models for music and sound. We will explain the methodology, explore its history, and demonstrate music and sound-specific applications such as real-time generation and various other downstream tasks. By bridging the gap from computer vision techniques and models, we aim to spark further research interest and democratize access to diffusion models for the music and sound domains.
The tutorial comprises four sections.
The first provides an overview of deep generative models and delves into the fundamentals of diffusion models.
The second section explores applications such as sound and music generation, as well as utilizing pre-trained models for music/sound editing and restoration.
In the third section, a hands-on demonstration will focus on training diffusion models and applying pre-trained models for music/sound restoration.
The final section outlines future research directions.
We anticipate that this tutorial, emphasizing both the foundational principles and practical implementation of diffusion models, will stimulate interest among the music and sound signal processing community. It aims to illuminate insights and applications concerning diffusion models, drawn from methodologies in computer vision.
Please refer to the ICASSP tutorial page: Accepted Tutorials – 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing.
🎹 Please freely contact to Chieh-Hsin (Jesse) Lai via [chieh-hsin.lai@sony.com] for any question or concern.