Learning Beyond Deep Learning (LB-DL) for Multimedia Processing
Acknowledging DL’s outstanding contributions to AI/ML, several inherent shortcomings have also been identified such as mathematically intractable, vulnerable to adversarial attacks, and computationally intensive due to the demand on high volume of data and the use of backpropagation for end-to-end network optimization. More fundamentally, the dominance of DL has undermined fundamental research in AI/ML at academic institutions (universities' function has been reduced to merely test new DL models marketed by the IT giants).
Developing new machine learning paradigms beyond deep learning is highly desirable, especially for multimedia processing where researchers frequently face large amount of multimodal data in audio, visual and other formats. Yet, the progress in this direction is still slow and sparse despite certain advancements in recent years. The objective of this panel is to attract researchers of common interests and generate momentum for future breakthroughs. Experts in the fields will present their discoveries featuring one or more characters of the new learning paradigm: interpretability, smaller model sizes, lower computational complexities, and high performance.
Convener and Panelist:
Ling Guan, Toronto Metropolitan University/Ryerson University
C.C. Jay Kuo, University of Southern California
Panelist:
Jianquan Liu, NEC Corporation, Japan
Shan Liu, Tencent Media Lab, USA
Paisarn Muneesawang, Mahidol University, Thailand
Convener and Panelist: Ling Guan (Toronto Metropolitan University/Ryerson University)
Title: Multi-view Analysis: Back to the Classic with a NN Twist
Abstract – Acknowledging the tremendous contributions of deep learning (DL) to a broad range of research and practical domains (such as image classification, video computing, natural language processing, etc.), optimizing quality of the features extracted by DL algorithms has consistently presented a challenge to reckon with, especially when working with multi-view/multimodal data. Though DL models have been hotly pursued to address this issue, no clear breakthroughs have been witnessed. Different from mainstream ML/DL methods, this presentation reports a breakthrough solution to this challenging problem. In particular, this platform calls upon discriminative multiple correlation (DMC) analysis that integrates statistical machine learning and a compact neural network. Statistics collected from numerous multi-view analysis and recognition tasks clearly show that the DMC model not only generate impressive (sometimes unprecedented) performance accuracies, but requires much shorter processing times and computing resource comparing with contemporary DL-based fusion models. What particularly worth noting are the performances on large scale databases such as TinyImageNet and NTU 120, which evidently demonstrate that the model has broken the standstill on the performance of these databases which has stood for several years.
Bio: Ling Guan is a professor emeritus in Electrical and Computer Engineering at Toronto Metropolitan University/Ryerson University, Toronto, Canada, and a Tier I Canada Research Chair in Multimedia and Computer Technology. He held visiting positions at British Telecom (1994), Tokyo Institute of Technology (1999), Princeton University (2000), National ICT Australia (2007), Nanyang Technological University (2007), Hong Kong Polytechnic University (2008-09) and Microsoft Research Asia (2002, 2009, 2017). Dr. Guan has published extensively in multimedia processing and communications, human-centered computing, machine learning, adaptive image and signal processing, and, more recently, new learning models for intelligent multimedia computing. He is a Fellow of the IEEE, an Elected Member of the Canadian Academy of Engineering, and an IEEE Circuits and System Society Distinguished Lecturer. Dr. Guan has been honored with numerous awards, including the 2014 IEEE Canada C.C. Gotlieb Medal for Achievement in Computer Science and Engineering and the 2005 IEEE Transactions on Circuits and Systems Best Paper Award.
Convener and Panelist: C.-C. Jay Kuo (University of Southern California)
Title: Modern AI, Data Fitting, and Green Learning
Abstract: Modern AI is built upon a data-driven approach, where AI problems are solved by deep neural networks (e.g., CNNs, ResNets, and Transformers). Do neural networks own human-like intelligence? To answer this, I will relate “modern AI” to “heavily supervised learning” (or weak AI) and “neural networks” to “data-fitting machines,” respectively. This view provides deeper insights into the working principle of neural networks, and we can clearly understand what they can and cannot do. They are fundamentally different from human brains. The next question is “whether neural networks provide a unique data-fitting machinery for huge input-output data pairs.” If not, what is the alternative? Is it a better one? I have researched this topic since 2014, developed alternative data-fitting machinery, and coined this emerging field “green learning (GL).” It is “green” since it demands low power consumption in training and inference. GL has many attractive characteristics, such as small model sizes, fewer training samples, mathematical transparency, ease of incremental learning, etc. The new GL developments will be introduced.
Bio: Dr. C.-C. Jay Kuo is the Ming Hsieh Chair Professor in Electrical and Computer Engineering-Systems, a Distinguished Professor of Electrical and Computer Engineering and Computer Science, and Director of the Multimedia Communication Laboratory (MCL) at the University of Southern California (USC). He received the B.S. degree from the National Taiwan University, Taipei, in 1980 and the M.S. and Ph.D. degrees from the Massachusetts Institute of Technology, Cambridge, in 1985 and 1987, respectively, all in Electrical Engineering. From October 1987 to December 1988, he was a computational and applied mathematics research assistant professor in the Department of Mathematics at the University of California, Los Angeles. He has been with USC since January 1989. Dr. Kuo is a Fellow of AAAS, ACM, IEEE, NAI, and SPIE. He is also an Academician of Academia Sinica in Taiwan. Dr. Kuo received several awards for his outstanding research contributions, including the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2011 Pan Wen-Yuan Outstanding Research Award, the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, the 2019 IEEE Signal Processing Society Claude Shannon-Harry Nyquist Technical Achievement Award, the 2020 IEEE TCMC Impact Award, the 72nd annual Technology and Engineering Emmy Award (2020), and the 2021 IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award.
Panelist: Jianquan Liu (NEC Corporation)
Title: Powering the Future: An Industry View on Green AI for Green Transformation (GX)
Abstract: The rapid advancement of AI, particularly power-intensive generative models and deep learning, presents a significant and growing challenge to energy consumption and computational resources. This escalating demand has prompted urgent attention from both industry and governments, fostering a critical dialogue on sustainability within our economic and social systems and the pursuit of a low-carbon society. To ensure a sustainable trajectory, industry and governmental bodies are spearheading investments in Green AI technologies as a cornerstone for achieving Green Transformation (GX). Current research and development efforts are concentrated on areas such as efficient AI model design, energy-conscious hardware, optimized training methodologies, and the integration of renewable energy sources. This talk will offer an industry perspective, highlighting key initiatives in Japan as a case study to stimulate discussion on the critical 'what, why, and how' of advancing Green AI for a sustainable future.
Bios: Dr. Jianquan Liu is currently a Director and Senior Principal Researcher at NEC Corporation, working on the topics of multimedia data processing. He is also a Visiting Professor at Nagoya University and an Adjunct Professor at Hosei University, Japan. Prior to NEC, he was with Tencent Inc. from 2005 to 2006. He has published 80+ papers at major international/domestic conferences and journals and filed 100+ PCT patents. He also successfully transformed these technological contributions into commercial products in the industry. For his industry contributions, Dr. Liu has received the DBSJ Young Researcher’s Achievement and Contribution Award 2024, the IEICE Achievement Award 2023, the Niwa & Takayanagi Achievement Award 2023, the 69th Electrical Science and Engineering Promotion Award, and the Minister of Education, Culture, Sports, Science and Technology (MEXT) Award in 2021, the KANTO Invention and Innovation Award 2021, the IPSJ Research and Engineering Award 2020 and the IPSJ Industrial Achievement Award 2018. Currently, Dr. Liu is/was serving as a Member-at-Large of IEEE SPS Industry Board (2025-2026), the Industry Co-chair of IEEE ICIP 2023, 2025 and ACM MM 2023, 2024, 2025; the General Co-chair of IEEE MIPR 2021; the PC Co-chair of IEEE IRI 2022, ICME 2020, AIVR 2019, BigMM 2019, ISM 2018, ICSC 2018, etc. He is a senior member of ACM, IEEE, and IPSJ, and a member of IEICE, APSIPA and DBSJ, and an associate editor of IEEE TMM (2021-2024), ACM TOMM (2022-), EURASIP JIVP (2023-), IEEE MultiMedia Magazine (2019-2022), ITE Transaction on Media Technology and Applications (2021-), APSIPA Transactions on Signal and Information Processing (2022-), and the Journal of Information Processing (2017-2021). Dr. Liu received the M.E. and Ph.D. degrees from the University of Tsukuba, Japan.
Panelist: Shan Liu (Tencent Media Lab, USA)
Title: Learning-based Video Coding for Human and Machines
Abstract: Deep learning has demonstrated its superior capability of solving computer vision and image processing problems in the last decade. Witnessing such success, researchers and engineers are motivated to investigate learning based technologies for image and video compression. Some researchers started exploring the utilization of neural networks for image compression as early as in 1990’s. However, the limit on computing power back then set restrictions on training and solving complex models. Hence, the results did not seem very promising. In recent years with the availability of greater and cheaper computing power, and huge amount of training data, learning based video and image coding have regained a lot of research interests, some encouraging progress and evidence have been demonstrated. On the other side, more and more video contents nowadays are consumed by machines, in broad applications such as video surveillance, healthcare monitoring, transportation, smart cities, etc. Machine vision is different from human vision due to different requirements and use cases. Hence, there have been great demands in recent years to develop video coding technologies and solutions that are different to what we have been using to meet human vision expectations for many years. In this talk, recent technology and standard development on learning-based video compression for both human and machines will be discussed.
Bio: Shan Liu is an internationally renowned technology leader and multimedia expert with breadth and depth. She holds more than 600 granted US patents and has published more than 100 peer reviewed papers and one book. She has won many awards and recognitions including the ISO&IEC Excellence Award, Technology Lumiere Award, two-time Best AE Awards from IEEE TCSVT, Outstanding Alumni Award from USC-SIPI, and “50 Women in Tech” by Forbes China. She is a Fellow of IEEE, IET and AAIA. She is a Distinguished Industry Leader of APSIPA. She has been a long-time contributor to international standardization and served as Project Editor of a few international standards such as the ISO/IEC/ITU-T H.266/VVC standard. She has chaired many standard ad-hoc and working groups. She has also served as General Chair as well as session chair of many international conferences. She has been a Vice Chair of IEEE Data Compression Standards Committee since 2019 and currently serves as Editor-in-Chief of IEEE TCSVT. She is a member of the steering committee (a.k.a. board) of AOMedia and several other committees and boards.
Dr. Liu received the B.Eng. degree in electronic engineering from Tsinghua University, the M.S. and Ph.D. degrees in electrical engineering from the University of Southern California, respectively. She is currently a Distinguished Scientist and General Manager at Tencent, where she leads global R&D teams to develop and deploy multimedia technologies serving hundreds of millions of users worldwide. Prior to joining Tencent in 2017, she held senior technical and management positions at a few companies including Global 500 and startups. Her research interests include audio-visual, volumetric, immersive and emerging media data compression, intelligence, transport and systems.
Panelist: Paisarn Muneesawang (Mahidol University, Thailand)
Title: Learning Beyond Deep Learning: Toward Physical AI with Reduced Complexity
Abstract: The dominance of deep learning in artificial intelligence has led to tremendous progress across vision, medical imaging, and robotics. However, this progress often comes at the cost of high computational demands, large memory footprints, and complex training procedures—challenges that hinder deployment in low-resource or embedded environments. This talk explores a shift from traditional deep learning paradigms to a new frontier: Physical AI. By optimizing learning complexity through knowledge distillation, neural architecture search, and early exit strategies, our recent works have enabled compact, efficient models suitable for real-time tasks in distributed IoT and healthcare settings. Beyond this, the talk draws inspiration from biological intelligence—showcasing how physical structures, embodied intelligence, and adaptive neural control in robotic systems (inspired by dung beetles, crawling animals, and compliant locomotion) can offload computational demands to physical mechanisms. This synthesis of compact neural learning and bioinspired physical design introduces a promising path for the future of AI—where intelligence is not solely in silicon, but emerges from the synergy of body, control, and environment.
Bio: Dr. Muneesawang is a distinguished Professor of Computer Engineering at Mahidol University, specializing in computer science, multimedia engineering, and information technology. He earned his B.Eng. in Electrical Engineering from Mahanakorn University of Technology (1996), M.Eng.Sc. from the University of New South Wales (1999), and Ph.D. in Engineering from the University of Sydney (2003), focusing on image and video content retrieval. Previously, he served as a Professor at Naresuan University (2016–2023), where he held leadership roles including Director of the Digital Innovation Research and Smart Grid Technology College, Dean of the Graduate School, and Vice President for Administrative Affairs. He has also been a visiting professor at Ryerson University, Canada, and Nanyang Technological University, Singapore.
Dr. Muneesawang is an IEEE Senior Member with significant contributions to digital twins, AI, and multimedia signal processing, evidenced by publications such as Visual Inspection Technology in the Hard Disc Drive Industry (Wiley-ISTE, 2015) and Multimedia Database Retrieval (Springer, 2014). He has received Gold and Silver Medals at international invention exhibitions and served as a consultant for Triple T Broadband and Fiberone Limited, as well as a Steering Committee member for Digital Health Platforms with Thailand’s Ministry of Public Health. His dedication to mentoring scholars and advancing technology solidifies his stature as a leading figure in his field.