The banner image was generated via one of powerful diffusion models, SDXL, with prompt: A cozy Italian café scene with a small porcelain cup of freshly brewed espresso on a saucer, placed on a rustic wooden table. In the background, warm sunlight filters through narrow cobblestone streets lined with colorful buildings and flower boxes. A barista stands behind a vintage espresso machine, and an Italian flag flutters nearby. The atmosphere is relaxed and authentic, capturing the charm of everyday life in Italy.
Chieh-Hsin (Jesse) Lai earned his Ph.D. in Mathematics from U. of Minnesota in 2021. Currently, he is a research scientist at Sony AI and a visiting assistant professor at the Department of Applied Mathematics of National Yang Ming Chiao Tung University, Taiwan. His expertise is in AI for science and deep generative models, especially diffusion models and its application for media content restoration.
He had mainly organized and delivered tutorials on diffusion models at ISMIR 2024, and ICASSP 2025. He also organized an Expo workshop at NeurIPS 2023 & NeurIPS 2024 on ``Media Content Restoration and Editing with Deep Generative Models'', and a social event at ICLR 2024 on ``Recent Advances on Diffusion and GAN''.
For more information, please visit his Google Scholar and Personal Website. Please contact Chieh-Hsin (Jesse) Lai with any questions or concerns about the tutorial.
Bac Nguyen Cong earned his M.Sc. degree (summa cum laude) in computer science from Universidad Central de Las Villas in 2015, followed by a Ph.D. from Ghent University in 2019. He joined Sony in 2019, focusing his research on representation learning, vision-language models, and generative modeling. With four years of hands-on professional industry experience in deep learning and machine learning, his work spans various application domains, such as text-to-speech and voice conversion, showing his important contributions to the field.
Masato Ishii is a Senior Research Scientist at Sony Research Inc. He received his Ph.D. from the University of Tokyo under the supervision of Professor Masashi Sugiyama. From 2010 to 2019, he worked as a researcher at NEC. Between 2017 and 2019, he served as a visiting researcher at RIKEN AIP. In 2019, he joined Sony Group Corporation and has been seconded to Sony Research Inc. since 2023. His research interests span from the fundamentals of machine learning to its applications in computer vision, with a current focus mainly on audio-visual generation.
Takashi Shibuya received his Ph.D. degree in engineering from University of Tsukuba in 2025. He is currently a staff research scientist at Sony AI. From 2018 to 2019, he was a visiting scholar at the Language Technologies Institute, Carnegie Mellon University. His research interests include generative AI, multimodal training, and audio signal processing.