15 minutes for each oral presentation
(~13 mins for presentation and 2 mins for Q&A)
Topic
The Rise of Deep Learning (DL) Compilers: Progress and Future Directions
Dr. Trent Lo
Tech Lead and Manager, NVIDIA
Abstract
The evolution of Deep Learning (DL) compilers has been remarkable over the past decade. CUDA programming dominated the early landscape of AI acceleration, providing direct GPU optimization capabilities but requiring significant low-level programming expertise. The introduction of graph-level compilers such as TensorRT and XLA shifted the landscape by providing automated optimization and abstracting complex hardware details, making GPU performance much more accessible. Later, the emergence of tile-level compilers like OpenAI Triton represented a middle-ground approach, offering developers more granular control compared to graph-level compilers. In this talk, we will explore these developments and look ahead to new challenges and opportunities, as DL compilers continue to shape the future of training and inference optimization.
Topic
Property-Based Analysis and Optimization for Complex Cyber-Physical Real-Time Systems
Professor Jian-Jia Chen
Technical University at Dortmund, Germany
Abstract
Cyber-physical real-time systems are information processing systems that require both functional as well as timing correctness and have proper interactions with the physical world. Since time naturally progresses in the physical world, safe bounds of deterministic or probabilistic timing properties are required. It is essential to construct timing analysis for complex cyber-physical real-time systems from *formal properties*. In this talk, I will utilize a few examples to demonstrate the needs of such property-based modular timing analysis. Furthermore, I will illustrate how to construct correct and precise translations of system behaviour for different applications in cyber-physical systems into proper mathematical properties that can be further used for property-based modulable designs.
Topic 1: The Real Success Story in AI Industry from Andes
Mr. Simon TC Wang (王庭昭), Moxa & MTK SW engineer; Andes Technical Marketing
Abstract
This talk presents Andes' unique and inspiring journey of AI innovation, highlighting the AI system architecture and RISC-V Vector/Custom Extensions that are reshaping the emerging AI landscape.
Topic 2: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures
Dr. Yuan-Ming Chang (張元銘), Andes Senior Engineer
Abstract
This talk introduces AutoIREE, a fully automated tuning system for AI models based on IREE. The system has three core components. The Model Partitioner partitions computational graphs into subgraphs based on the operators' tensor sizes while preserving their parameters. The Trial Assigner assigns trials among subgraphs based on their computational complexity. The Schedule Generator and Tuner generates loop schedules for subgraphs and collects performance metrics of schedules running on target hardware. By dynamically exploring tiling configurations that optimize hardware resources (e.g., cache and vector units), we iteratively refine schedules for better performance. Experimental results show that our method significantly improves model performance within IREE’s optimization pipeline.
Topic 3: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension
Mr. Quey-Liang Kao (高魁良), Andes RD-CA Manager
Abstract
Leveraging sparsity in neural network weights can significantly enhance efficiency when deploying models on mobile and edge devices. However, the RISC-V ecosystem lacks a complete solution for sparse inference. This talk introduces the challenges of enabling sparse inference with the RISC-V Vector Extension (RVV) and presents preliminary experimental findings of existing gaps. While the implementation and optimization efforts are still in progress, we aim to advance both the RISC-V community and the XNNPACK project.
Topic
Memory-efficient Model Compilation for Edge AI Inference
Professor Tsung Tai Yeh (葉宗泰)
National Yang Ming Chiao Tung University
Abstract
The deep neural network (DNN) has been widely applied in multiple application domains. However, conducting the model inference on large DNN models in the data center servers consumes substantial energy. The resource-constrained edge device often offloads model inference to remote data center servers to gain the acceleration of the model inference execution. Such a data offloading method also increases security threats because of untrustworthy network connections. To mitigate these problems, Edge AI aims to squeeze the DNN model on resource-constrained edge devices and achieves Green AI computing by lowering the energy consumption of model inference through small DNN models. Unlike desktop computers and servers, edge devices often limit their memory usage to reduce the hardware price and energy consumption. This limitation raises significant challenges when deploying DNN models on edge devices. Consequently, this talk will introduce our recent research on the memory-efficient model compilation for edge AI inference that reduces memory usage and improves the data reuse rate of the DNN model to accelerate model inference while lowering the energy consumption on edge devices. Finally, this talk will discuss future work and open challenges for edge AI systems and hardware.
Topic
In-Memory Computing with Flash Memory
李祥邦 處長
旺宏電子
Abstract
在當今人工智慧(AI)快速發展的時代,「模型越大越聰明」已成為普遍共識。然而,隨著模型規模日益龐大,所需的儲存空間、運算步驟、時間與能耗也隨之劇增。在傳統電腦架構中,運算單元與記憶體之間頻繁的資料傳輸形成瓶頸,成為限制整體運算效能的關鍵因素之一。Flash memory 所構成的固態硬碟(SSD),已被廣泛應用於提升儲存效率,成為現代電腦系統效能優化的重要基礎。展望未來,Flash memory 或 SSD 不僅將持續優化資料存取效率,更可能取代部分 CPU 或 GPU 的運算功能,直接參與巨量資料處理,在 AI 系統中扮演更關鍵的角色。
Topic
The Edge AI Compiler Journey at MediaTek
Dr. Bor-Yeh Shen (沈柏曄)
Senior Technical Manager, MediaTek
Abstract
In the rapidly evolving field of artificial intelligence, the efficient and optimized compilation of AI models is crucial for achieving high performance on edge devices. This talk will guide you through the development journey of the MediaTek AI Compiler, a key component of the NeuroPilot SDK, designed to enhance the deployment of AI applications across MediaTek's diverse product lines. We will delve into the architecture and features of the MediaTek AI Compiler, highlighting its capabilities and addressing the challenges in optimizing neural network models for various AI tasks.
Topic
Building a Scalable AI/ML Software Stack for RISC-V: From PyTorch to Deployment on SiFive Intelligence XM Platforms
Mr. Hong-Rong Hsu (許宏榮)
Principal Engineer, SiFive Taiwan
Abstract
This talk presents the SiFive AI/ML Software Stack for RISC-V, designed to enable efficient end-to-end deployment of AI models. By leveraging the IREE compiler infrastructure, the stack supports model lowering and hardware-aware optimization targeting the SiFive X390 cores and the on-chip AI matrix engine in the new XM series platform. We will walk through real-world deployment examples to highlight the flexibility, performance, and compiler-driven design of the stack for executing modern AI workloads on open RISC-V hardware.