Title: The Rise of Deep Learning (DL) Compilers: Progress and Future Directions
Tech Lead and Manager, NVIDIA
Bio
Trent Lo serves as a Tech Lead and Manager at NVIDIA's Santa Clara headquarters, where he has driven the development of Deep Learning (DL) compiler technologies. Since joining the company in 2018, his work has advanced key compiler frameworks, specifically TensorRT and XLA, which are fundamental tools in optimizing DL model performance. His notable technical contributions include pioneering Dynamic Shape Support in the TensorRT compiler and developing Horizontal Kernel Fusion in XLA, innovations that have enhanced the flexibility and efficiency of DL model training/deployment. Prior to NVIDIA, Trent led the Heterogeneous Compute Compiler team at MediaTek in Hsinchu. He holds a PhD from National Tsing Hua University, Taiwan.
Abstract
The evolution of Deep Learning (DL) compilers has been remarkable over the past decade. CUDA programming dominated the early landscape of AI acceleration, providing direct GPU optimization capabilities but requiring significant low-level programming expertise. The introduction of graph-level compilers such as TensorRT and XLA shifted the landscape by providing automated optimization and abstracting complex hardware details, making GPU performance much more accessible. Later, the emergence of tile-level compilers like OpenAI Triton represented a middle-ground approach, offering developers more granular control compared to graph-level compilers. In this talk, we will explore these developments and look ahead to new challenges and opportunities, as DL compilers continue to shape the future of training and inference optimization.
Title: Property-Based Analysis and Optimization for Complex Cyber-Physical Real-Time Systems
Technical University at Dortmund, Germany
Bio
Jian-Jia Chen is a Professor at Department of Informatics in TU Dortmund University in Germany. He was a Junior Professor at Department of Informatics in Karlsruhe Institute of Technology (KIT) in Germany from May 2010 to March 2014. He received his Ph.D. degree from Department of Computer Science and Information Engineering, National Taiwan University, Taiwan in 2006. He received his B.S. degree from the Department of Chemistry at National Taiwan University 2001. Between Jan. 2008 and April 2010, he was a postdoc researcher at ETH Zurich, Switzerland. His research interests include real-time systems, embedded systems, energy-efficient scheduling, power-aware designs, temperature-aware scheduling, and distributed computing. He received the European Research Council (ERC) Consolidator Award in 2019. He has received more than 10 Best Paper Awards and Outstanding Paper Awards and has been involved in Technical Committees in many international conferences.
Abstract
Cyber-physical real-time systems are information processing systems that require both functional as well as timing correctness and have proper interactions with the physical world. Since time naturally progresses in the physical world, safe bounds of deterministic or probabilistic timing properties are required. It is essential to construct timing analysis for complex cyber-physical real-time systems from *formal properties*. In this talk, I will utilize a few examples to demonstrate the needs of such property-based modular timing analysis. Furthermore, I will illustrate how to construct correct and precise translations of system behaviour for different applications in cyber-physical systems into proper mathematical properties that can be further used for property-based modulable designs.
Title 1: The Real Success Story in AI Industry from Andes (10 min)
Title 2: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures (20 min)
Title 3: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension (30 min)
Moxa & MTK SW engineer; Andes Technical Marketing
Title: The Real Success Story in AI Industry from Andes
Abstract
This talk presents Andes' unique and inspiring journey of AI innovation, highlighting the AI system architecture and RISC-V Vector/Custom Extensions that are reshaping the emerging AI landscape.
Andes Senior Engineer
Title: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures
Abstract
This talk introduces AutoIREE, a fully automated tuning system for AI models based on IREE. The system has three core components. The Model Partitioner partitions computational graphs into subgraphs based on the operators' tensor sizes while preserving their parameters. The Trial Assigner assigns trials among subgraphs based on their computational complexity. The Schedule Generator and Tuner generates loop schedules for subgraphs and collects performance metrics of schedules running on target hardware. By dynamically exploring tiling configurations that optimize hardware resources (e.g., cache and vector units), we iteratively refine schedules for better performance. Experimental results show that our method significantly improves model performance within IREE’s optimization pipeline.
Andes RD-CA Manager
Title: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension
Abstract
Leveraging sparsity in neural network weights can significantly enhance efficiency when deploying models on mobile and edge devices. However, the RISC-V ecosystem lacks a complete solution for sparse inference. This talk introduces the challenges of enabling sparse inference with the RISC-V Vector Extension (RVV) and presents preliminary experimental findings of existing gaps. While the implementation and optimization efforts are still in progress, we aim to advance both the RISC-V community and the XNNPACK project.
Title: Memory-efficient Model Compilation for Edge AI Inference
National Yang Ming Chiao Tung University
Bio
Tsung Tai Yeh is an associate professor of computer science at National Yang Ming Chiao Tung University, Taiwan. He obtained a Ph.D. from the electrical computer engineering school at Purdue University, USA. His research work spans computer architecture, computer systems, and programming languages. He received the Lynn Fellowship at Purdue University and worked at AMD research. His compiler research work was also nominated for the Best Paper Award at the PPoPP conference and was published in multiple top-ranking conference proceedings (ISCA, ASPLOS, HPCA, PPoPP, NeuraIPS).
Abstract
The deep neural network (DNN) has been widely applied in multiple application domains. However, conducting the model inference on large DNN models in the data center servers consumes substantial energy. The resource-constrained edge device often offloads model inference to remote data center servers to gain the acceleration of the model inference execution. Such a data offloading method also increases security threats because of untrustworthy network connections. To mitigate these problems, Edge AI aims to squeeze the DNN model on resource-constrained edge devices and achieves Green AI computing by lowering the energy consumption of model inference through small DNN models. Unlike desktop computers and servers, edge devices often limit their memory usage to reduce the hardware price and energy consumption. This limitation raises significant challenges when deploying DNN models on edge devices. Consequently, this talk will introduce our recent research on the memory-efficient model compilation for edge AI inference that reduces memory usage and improves the data reuse rate of the DNN model to accelerate model inference while lowering the energy consumption on edge devices. Finally, this talk will discuss future work and open challenges for edge AI systems and hardware.
Title: In-Memory Computing with Flash Memory
旺宏電子
Bio
李祥邦現任旺宏電子前瞻系統實驗室處長,致力於前瞻非揮發性記憶體 (NVM) 應用與智慧儲存系統的研發。自中原大學取得電機工程學士與碩士學位後,於1998年加入旺宏電子,累積二十餘年研究與產品開發經驗。他所帶領的團隊在 3D NAND/NOR Flash、儲存系統、人工智慧架構等技術皆有卓越表現,並積極探索各式 Computing-in-Memory、In-Memory Search、向量相似搜尋、AI 加速、以及高可靠度儲存系統之應用。其研究團隊曾多次在 IEDM、MICRO、DAC、ISSCC、ISCA、ICCAD、IMW 等國際頂尖會議發表論文;近期以 3D NAND/NOR 整合運算與加速相關技術為主,提出新一代記憶體解決方案與智慧邊緣運算的解決方案。
Abstract
在當今人工智慧(AI)快速發展的時代,「模型越大越聰明」已成為普遍共識。然而,隨著模型規模日益龐大,所需的儲存空間、運算步驟、時間與能耗也隨之劇增。在傳統電腦架構中,運算單元與記憶體之間頻繁的資料傳輸形成瓶頸,成為限制整體運算效能的關鍵因素之一。Flash memory 所構成的固態硬碟(SSD),已被廣泛應用於提升儲存效率,成為現代電腦系統效能優化的重要基礎。展望未來,Flash memory 或 SSD 不僅將持續優化資料存取效率,更可能取代部分 CPU 或 GPU 的運算功能,直接參與巨量資料處理,在 AI 系統中扮演更關鍵的角色。
Title: The Edge AI Compiler Journey at MediaTek
Senior Technical Manager, MediaTek
Bio
Dr. BY Shen is a Senior Technical Manager in the Computing and Artificial Intelligence Technology Group at MediaTek, with over a decade of industry experience in compiler design and optimization. Since 2017, he has led the development of the NeuroPilot AI compiler infrastructure, which is widely used across MediaTek products to enable Edge AI capabilities. Dr. Shen holds a Ph.D. in Computer Science from National Chiao Tung University, Taiwan.
Abstract
In the rapidly evolving field of artificial intelligence, the efficient and optimized compilation of AI models is crucial for achieving high performance on edge devices. This talk will guide you through the development journey of the MediaTek AI Compiler, a key component of the NeuroPilot SDK, designed to enhance the deployment of AI applications across MediaTek's diverse product lines. We will delve into the architecture and features of the MediaTek AI Compiler, highlighting its capabilities and addressing the challenges in optimizing neural network models for various AI tasks.
Title: Building a Scalable AI/ML Software Stack for RISC-V: From PyTorch to Deployment on SiFive Intelligence XM Platforms
Principal Engineer, SiFive Taiwan
Bio
Hong-Rong Hsu has been leading and actively contributing to the Open Source Software team since 2019 and is currently serving as an AI/ML Team Manager at SiFive. In this role, he is responsible for both overseeing and hands-on development of AI/ML software, as well as managing AI/ML model e2e deployment. Prior to joining SiFive, Hong-Rong gained valuable experience at MediaTek (2010–2018) and Bitmain (2018–2019). His expertise spans AI/ML, system software, compilers, and RISC-V, with a strong focus on driving innovation and contributing to technical advancements in these areas.
Abstract
This talk presents the SiFive AI/ML Software Stack for RISC-V, designed to enable efficient end-to-end deployment of AI models. By leveraging the IREE compiler infrastructure, the stack supports model lowering and hardware-aware optimization targeting the SiFive X390 cores and the on-chip AI matrix engine in the new XM series platform. We will walk through real-world deployment examples to highlight the flexibility, performance, and compiler-driven design of the stack for executing modern AI workloads on open RISC-V hardware.