Tutorial
Directly Parameterized Neural Network Construction for Generalization and Robustness
Abstract:
Interpretable deep learning methods for inverse problems offer a compelling alternative to traditional black-box architectures by leveraging classical optimization principles to guide network design. This tutorial will provide a comprehensive overview of unrolled optimization networks, focusing on convolutional dictionary learning models, nonlocal self-similarity priors, and interpretable primal-dual splitting methods for MRI reconstruction. We will explore how these architectures achieve competitive performance in denoising, demosaicing, and compressed sensing while maintaining interpretability, parameter efficiency, and robust generalization. Special attention will be given to noise-adaptive parameterization, group sparsity priors, and circulant-sparse attention mechanisms that enable effective self-supervised learning and improved generalization. Through theoretical insights and empirical results, this talk will demonstrate how interpretable deep learning can bridge the gap between classical inverse problem formulations and modern neural network approaches.
[Speaker] Amirhossein Khalilian-Gourtani (New York University, USA)
[Biography] Amirhossein Khalilian-Gourtani is currently a postdoctoral research fellow at the New York University Grossman School of Medicine, Department of Neurology. He received his Ph.D. in Electrical Engineering from the New York University Tandon School of Engineering in 2022. His main research interests include signal processing, machine learning, convex and non-smooth optimization, medical signal processing, and numerical analysis. He was the recipient of Postdoctoral Merit Award from Society for the Neurobiology of Language, the Ernst Weber PhD Fellowship Award, and the Myron M. Rosenthal M.Sc. Award from New York University. He is a reviewer of more than ten computer vision related journals and conferences of IEEE.
[Speaker] Nikola Janjušević (New York University, USA)
[Biography] Nikola Janjušević (Member, IEEE) is currently a postdoctoral research fellow at the New York University Grossman School of Medicine, Department of Radiology. He received his Ph.D. in Electrical Engineering from the New York University Tandon School of Engineering in 2024. His research interests lie at the intersection of imaging inverse problems, interpretable deep learning, non-smooth and convex optimization, and medical imaging. He has published journal papers on interpretable noise-adaptive deep learning architectures and compressed sensing optical coherence tomography. Previously, he was a research scientist at Apple Video Engineering and Samsung Research America.
Exploring and Standardizing Deep Learning-Based Video Compression Technologies
Abstract:
Deep learning-based video compression has been a very active research area and seen compression efficiency significantly improved within just a few years’ time. Standards development organizations have taken notice of this fast-developing technology trend, and have been investigating and exploring deep learning-based video compression technologies for standardization. Some of the investigation has reached the standardization stage. For example, the Joint Video Experts Team (JVET) recently finalized new editions of the Versatile Supplemental Enhancement Information specification (H.274/VSEI) with the addition of new neural network-based post filter SEI messages and generative face video SEI messages. Other technologies, such as neural network-based video coding, are being actively explored, and could be considered for standardization in the future. JVET and The Moving Picture Experts Group (MPEG) are also standardizing normative and non-normative deep learning-based video compression and processing technologies that improve compression efficiency for video coding for machines applications, where the ultimate “consumers” of the encoded video are learning-based machine vision tasks such as video analysis and video understanding. This tutorial will present an overview of the deep learning-based video coding activities in JVET and MPEG, including timeline associated with various standardization and exploration activities.
[Speaker] Iole Moccagatta (Intel Corporation, USA)
[Biography] Dr. Iole Moccagatta is a Sr. Principal Engineer at Intel working on HW Multimedia IPs. Prior to Intel she held the position of Senior Video Architect at NVIDIA, and that of Science Director at IMEC, Belgium. She has been a very active member of MPEG, ITU-T, and JPEG, where she has represented US interests and companies and made many technical contributions. A number of those have been included in MPEG and JPEG standards. She is currently Co-chair of the MPEG/ITU-T Joint Video Experts Team (JVET) Ad-Hoc Group on H.266/VVC Conformance and Co-editor of the H.266/VVC Conformance Testing document. Dr. Moccagatta has also been an active participant of the Alliance for Open Media (AOM) AV1 Codec WG, where she has co-authored two adopted proposals. She currently represents Intel in the AOM Board. Dr. Moccagatta is also serving as IEEE Signal Processing Society (SPS) Members-at-Large 2024-2026 and has served as IEEE SPS Regional Director-at-Large Regions 1-6 2021-2022. Dr. Moccagatta is the author or co-author of more than 30 publications, 2 book chapters, and 15+ talks and tutorials in the field of image and video coding. She holds more than 10 patents in the same fields. She received a Diploma of Electronic Engineering from the University of Pavia, Italy, and a PhD from the Swiss Federal Institute of Technology in Lausanne, Switzerland.
[Speaker] Yan Ye (Alibaba, USA)
[Biography] Yan Ye is currently the Head of Video Technology Lab of Alibaba’s Damo Academy in Sunnyvale California. Prior to Alibaba, she held various technical and management and positions at InterDigital, Dolby Laboratories, and Qualcomm.
Throughout her career, Dr. Ye has been actively involved in developing international video coding and video streaming standards in the Joint Team of Video Experts (JVET) of ITU-T SG16/Q.6 Video Coding Experts Group (VCEG) and ISO/IEC JTC 1/SC 29 Moving Picture Experts Group (MPEG). She is currently an Associate Rapporteur of the ITU-T SG21/Q.6, the Group Chair of INCITS/MPEG task group, and the focus area chair on 2D video quality assessment of the ISO/IEC SC 29/AG 5 MPEG Visual Quality Assessment (VQA). She also chairs the Ad Hoc Groups on Generative Face Video Compression in JVET and on Compressed Video for study of Quality Metrics in MPEG VQA. She has made many technical contributions to well-known video coding and streaming standards such as H.264/AVC, H.265/HEVC, H.266/VVC, MPEG DASH and MPEG OMAF. She is an Editor of the VVC test model, the 360Lib algorithm description, and the HEVC standard version 2 and version 3. She is a prolific inventor with hundreds of granted U.S. patents and patent applications and a co-author of more than 70 conference and journal papers. Dr. Ye was a Distinguished Industrial Speaker of the IEEE Signal Processing Society (2022-2024). She was a guest editor of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) special section on “the joint Call for Proposals on video compression with capability beyond HEVC” and TCSVT special section on “Versatile Video Coding”. She co-chairs the technical program committee of IEEE Data Compression Conference (DCC) and the conference subcommittee of the IEEE Visual Signal Processing and Communication Technical Committee (VSPC-TC).Dr. Ye is devoted to multimedia standards development, hardware and software video codec implementations, as well as deep learning-based video research. Her research interests include advanced video coding, processing and streaming algorithms, real-time and immersive video communications, AR/VR/MR, and deep learning-based video coding, processing, and quality assessment algorithms. Dr. Ye received her Ph.D. degree from the University of California, San Diego and her B.S. and M.S. degrees from the University of Science and Technology of China.