Program

15 minutes for each oral presentation
(~13 mins for presentation and 2 mins for Q&A)

Day one (2025-5-28 Wednesday)

Day two (2025-5-29 Thursday)

Keynote Speech I

Topic

The Rise of Deep Learning (DL) Compilers: Progress and Future Directions

Dr. Trent Lo

Tech Lead and Manager, NVIDIA

Abstract

The evolution of Deep Learning (DL) compilers has been remarkable over the past decade. CUDA programming dominated the early landscape of AI acceleration, providing direct GPU optimization capabilities but requiring significant low-level programming expertise. The introduction of graph-level compilers such as TensorRT and XLA shifted the landscape by providing automated optimization and abstracting complex hardware details, making GPU performance much more accessible. Later, the emergence of tile-level compilers like OpenAI Triton represented a middle-ground approach, offering developers more granular control compared to graph-level compilers. In this talk, we will explore these developments and look ahead to new challenges and opportunities, as DL compilers continue to shape the future of training and inference optimization.

Keynote Speech II

Topic

Property-Based Analysis and Optimization for Complex Cyber-Physical Real-Time Systems

Professor Jian-Jia Chen

Technical University at Dortmund, Germany

Abstract

Cyber-physical real-time systems are information processing systems that require both functional as well as timing correctness and have proper interactions with the physical world. Since time naturally progresses in the physical world, safe bounds of deterministic or probabilistic timing properties are required. It is essential to construct timing analysis for complex cyber-physical real-time systems from *formal properties*. In this talk, I will utilize a few examples to demonstrate the needs of such property-based modular timing analysis. Furthermore, I will illustrate how to construct correct and precise translations of system behaviour for different applications in cyber-physical systems into proper mathematical properties that can be further used for property-based modulable designs.

Invited Talk I

Topic 1: The Real Success Story in AI Industry from Andes

Mr. Simon TC Wang (王庭昭), Moxa & MTK SW engineer; Andes Technical Marketing

Abstract

This talk presents Andes' unique and inspiring journey of AI innovation, highlighting the AI system architecture and RISC-V Vector/Custom Extensions that are reshaping the emerging AI landscape.

Topic 2: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures

Dr. Yuan-Ming Chang (張元銘), Andes Senior Engineer

Abstract

This talk introduces AutoIREE, a fully automated tuning system for AI models based on IREE. The system has three core components. The Model Partitioner partitions computational graphs into subgraphs based on the operators' tensor sizes while preserving their parameters. The Trial Assigner assigns trials among subgraphs based on their computational complexity. The Schedule Generator and Tuner generates loop schedules for subgraphs and collects performance metrics of schedules running on target hardware. By dynamically exploring tiling configurations that optimize hardware resources (e.g., cache and vector units), we iteratively refine schedules for better performance. Experimental results show that our method significantly improves model performance within IREE’s optimization pipeline.

Topic 3: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension

Mr. Quey-Liang Kao (高魁良), Andes RD-CA Manager

Abstract

Leveraging sparsity in neural network weights can significantly enhance efficiency when deploying models on mobile and edge devices. However, the RISC-V ecosystem lacks a complete solution for sparse inference. This talk introduces the challenges of enabling sparse inference with the RISC-V Vector Extension (RVV) and presents preliminary experimental findings of existing gaps. While the implementation and optimization efforts are still in progress, we aim to advance both the RISC-V community and the XNNPACK project.

Invited Talk II

Topic

Memory-efficient Model Compilation for Edge AI Inference

Professor Tsung Tai Yeh (葉宗泰)

National Yang Ming Chiao Tung University

Abstract

The deep neural network (DNN) has been widely applied in multiple application domains. However, conducting the model inference on large DNN models in the data center servers consumes substantial energy. The resource-constrained edge device often offloads model inference to remote data center servers to gain the acceleration of the model inference execution. Such a data offloading method also increases security threats because of untrustworthy network connections. To mitigate these problems, Edge AI aims to squeeze the DNN model on resource-constrained edge devices and achieves Green AI computing by lowering the energy consumption of model inference through small DNN models. Unlike desktop computers and servers, edge devices often limit their memory usage to reduce the hardware price and energy consumption. This limitation raises significant challenges when deploying DNN models on edge devices. Consequently, this talk will introduce our recent research on the memory-efficient model compilation for edge AI inference that reduces memory usage and improves the data reuse rate of the DNN model to accelerate model inference while lowering the energy consumption on edge devices. Finally, this talk will discuss future work and open challenges for edge AI systems and hardware.

Invited Talk III

Topic

In-Memory Computing with Flash Memory

李祥邦處長

旺宏電子

Abstract

在當今人工智慧（AI）快速發展的時代，「模型越大越聰明」已成為普遍共識。然而，隨著模型規模日益龐大，所需的儲存空間、運算步驟、時間與能耗也隨之劇增。在傳統電腦架構中，運算單元與記憶體之間頻繁的資料傳輸形成瓶頸，成為限制整體運算效能的關鍵因素之一。Flash memory 所構成的固態硬碟（SSD），已被廣泛應用於提升儲存效率，成為現代電腦系統效能優化的重要基礎。展望未來，Flash memory 或 SSD 不僅將持續優化資料存取效率，更可能取代部分 CPU 或 GPU 的運算功能，直接參與巨量資料處理，在 AI 系統中扮演更關鍵的角色。

Invited Talk IV

Topic

The Edge AI Compiler Journey at MediaTek

Dr. Bor-Yeh Shen (沈柏曄)

Senior Technical Manager, MediaTek

Abstract

In the rapidly evolving field of artificial intelligence, the efficient and optimized compilation of AI models is crucial for achieving high performance on edge devices. This talk will guide you through the development journey of the MediaTek AI Compiler, a key component of the NeuroPilot SDK, designed to enhance the deployment of AI applications across MediaTek's diverse product lines. We will delve into the architecture and features of the MediaTek AI Compiler, highlighting its capabilities and addressing the challenges in optimizing neural network models for various AI tasks.

Invited Talk V

Topic

Building a Scalable AI/ML Software Stack for RISC-V: From PyTorch to Deployment on SiFive Intelligence XM Platforms

Mr. Hong-Rong Hsu (許宏榮)

Principal Engineer, SiFive Taiwan

Abstract

This talk presents the SiFive AI/ML Software Stack for RISC-V, designed to enable efficient end-to-end deployment of AI models. By leveraging the IREE compiler infrastructure, the stack supports model lowering and hardware-aware optimization targeting the SiFive X390 cores and the on-chip AI matrix engine in the new XM series platform. We will walk through real-world deployment examples to highlight the flexibility, performance, and compiler-driven design of the stack for executing modern AI workloads on open RISC-V hardware.

Page updated

Google Sites

Report abuse