Speakers

Keynote I - Wednesday Morning (9:30 - 10:50am)

Title: The Rise of Deep Learning (DL) Compilers: Progress and Future Directions

Speaker: Dr. Trent Lo

Tech Lead and Manager, NVIDIA

Bio

Trent Lo serves as a Tech Lead and Manager at NVIDIA's Santa Clara headquarters, where he has driven the development of Deep Learning (DL) compiler technologies. Since joining the company in 2018, his work has advanced key compiler frameworks, specifically TensorRT and XLA, which are fundamental tools in optimizing DL model performance. His notable technical contributions include pioneering Dynamic Shape Support in the TensorRT compiler and developing Horizontal Kernel Fusion in XLA, innovations that have enhanced the flexibility and efficiency of DL model training/deployment. Prior to NVIDIA, Trent led the Heterogeneous Compute Compiler team at MediaTek in Hsinchu. He holds a PhD from National Tsing Hua University, Taiwan.

Abstract

The evolution of Deep Learning (DL) compilers has been remarkable over the past decade. CUDA programming dominated the early landscape of AI acceleration, providing direct GPU optimization capabilities but requiring significant low-level programming expertise. The introduction of graph-level compilers such as TensorRT and XLA shifted the landscape by providing automated optimization and abstracting complex hardware details, making GPU performance much more accessible. Later, the emergence of tile-level compilers like OpenAI Triton represented a middle-ground approach, offering developers more granular control compared to graph-level compilers. In this talk, we will explore these developments and look ahead to new challenges and opportunities, as DL compilers continue to shape the future of training and inference optimization.

Keynote II - Thursday Morning (9:20 - 10:40am)

Title: Property-Based Analysis and Optimization for Complex Cyber-Physical Real-Time Systems

Speaker: Professor Jian-Jia Chen

Technical University at Dortmund, Germany

Bio

Jian-Jia Chen is a Professor at Department of Informatics in TU Dortmund University in Germany. He was a Junior Professor at Department of Informatics in Karlsruhe Institute of Technology (KIT) in Germany from May 2010 to March 2014. He received his Ph.D. degree from Department of Computer Science and Information Engineering, National Taiwan University, Taiwan in 2006. He received his B.S. degree from the Department of Chemistry at National Taiwan University 2001. Between Jan. 2008 and April 2010, he was a postdoc researcher at ETH Zurich, Switzerland. His research interests include real-time systems, embedded systems, energy-efficient scheduling, power-aware designs, temperature-aware scheduling, and distributed computing. He received the European Research Council (ERC) Consolidator Award in 2019. He has received more than 10 Best Paper Awards and Outstanding Paper Awards and has been involved in Technical Committees in many international conferences.

Abstract

Cyber-physical real-time systems are information processing systems that require both functional as well as timing correctness and have proper interactions with the physical world. Since time naturally progresses in the physical world, safe bounds of deterministic or probabilistic timing properties are required. It is essential to construct timing analysis for complex cyber-physical real-time systems from *formal properties*. In this talk, I will utilize a few examples to demonstrate the needs of such property-based modular timing analysis. Furthermore, I will illustrate how to construct correct and precise translations of system behaviour for different applications in cyber-physical systems into proper mathematical properties that can be further used for property-based modulable designs.

Invited Talk I - Wednesday Afternoon (13:30 - 14:30pm)

Title 1: The Real Success Story in AI Industry from Andes (10 min)

Title 2: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures (20 min)

Title 3: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension (30 min)

Speaker 1: Mr. Simon TC Wang (王庭昭)

Moxa & MTK SW engineer; Andes Technical Marketing

Title: The Real Success Story in AI Industry from Andes

Abstract

This talk presents Andes' unique and inspiring journey of AI innovation, highlighting the AI system architecture and RISC-V Vector/Custom Extensions that are reshaping the emerging AI landscape.

Speaker 2: Dr. Yuan-Ming Chang (張元銘)

Andes Senior Engineer

Title: AutoIREE: Automatic Schedule Generation for AI Models on RISC-V Vector Architectures

Abstract

This talk introduces AutoIREE, a fully automated tuning system for AI models based on IREE. The system has three core components. The Model Partitioner partitions computational graphs into subgraphs based on the operators' tensor sizes while preserving their parameters. The Trial Assigner assigns trials among subgraphs based on their computational complexity. The Schedule Generator and Tuner generates loop schedules for subgraphs and collects performance metrics of schedules running on target hardware. By dynamically exploring tiling configurations that optimize hardware resources (e.g., cache and vector units), we iteratively refine schedules for better performance. Experimental results show that our method significantly improves model performance within IREE’s optimization pipeline.

Speaker 3: Mr. Quey-Liang Kao (高魁良)

Andes RD-CA Manager

Title: Supporting Sparse Inference in XNNPACK with RISC-V Vector Extension

Abstract

Leveraging sparsity in neural network weights can significantly enhance efficiency when deploying models on mobile and edge devices. However, the RISC-V ecosystem lacks a complete solution for sparse inference. This talk introduces the challenges of enabling sparse inference with the RISC-V Vector Extension (RVV) and presents preliminary experimental findings of existing gaps. While the implementation and optimization efforts are still in progress, we aim to advance both the RISC-V community and the XNNPACK project.

Invited Talk II - Wednesday Afternoon (15:50 - 16:50pm)

Title: Memory-efficient Model Compilation for Edge AI Inference

Speaker: Professor Tsung Tai Yeh (葉宗泰)

National Yang Ming Chiao Tung University

Bio

Tsung Tai Yeh is an associate professor of computer science at National Yang Ming Chiao Tung University, Taiwan. He obtained a Ph.D. from the electrical computer engineering school at Purdue University, USA. His research work spans computer architecture, computer systems, and programming languages. He received the Lynn Fellowship at Purdue University and worked at AMD research. His compiler research work was also nominated for the Best Paper Award at the PPoPP conference and was published in multiple top-ranking conference proceedings (ISCA, ASPLOS, HPCA, PPoPP, NeuraIPS).

Abstract

The deep neural network (DNN) has been widely applied in multiple application domains. However, conducting the model inference on large DNN models in the data center servers consumes substantial energy. The resource-constrained edge device often offloads model inference to remote data center servers to gain the acceleration of the model inference execution. Such a data offloading method also increases security threats because of untrustworthy network connections. To mitigate these problems, Edge AI aims to squeeze the DNN model on resource-constrained edge devices and achieves Green AI computing by lowering the energy consumption of model inference through small DNN models. Unlike desktop computers and servers, edge devices often limit their memory usage to reduce the hardware price and energy consumption. This limitation raises significant challenges when deploying DNN models on edge devices. Consequently, this talk will introduce our recent research on the memory-efficient model compilation for edge AI inference that reduces memory usage and improves the data reuse rate of the DNN model to accelerate model inference while lowering the energy consumption on edge devices. Finally, this talk will discuss future work and open challenges for edge AI systems and hardware.

Invited Talk III - Thursday Afternoon (13:05 - 14:05pm)

Title: In-Memory Computing with Flash Memory

Speaker: 李祥邦處長

旺宏電子

Bio

李祥邦現任旺宏電子前瞻系統實驗室處長，致力於前瞻非揮發性記憶體 (NVM) 應用與智慧儲存系統的研發。自中原大學取得電機工程學士與碩士學位後，於1998年加入旺宏電子，累積二十餘年研究與產品開發經驗。他所帶領的團隊在 3D NAND/NOR Flash、儲存系統、人工智慧架構等技術皆有卓越表現，並積極探索各式 Computing-in-Memory、In-Memory Search、向量相似搜尋、AI 加速、以及高可靠度儲存系統之應用。其研究團隊曾多次在 IEDM、MICRO、DAC、ISSCC、ISCA、ICCAD、IMW 等國際頂尖會議發表論文；近期以 3D NAND/NOR 整合運算與加速相關技術為主，提出新一代記憶體解決方案與智慧邊緣運算的解決方案。

Abstract

在當今人工智慧（AI）快速發展的時代，「模型越大越聰明」已成為普遍共識。然而，隨著模型規模日益龐大，所需的儲存空間、運算步驟、時間與能耗也隨之劇增。在傳統電腦架構中，運算單元與記憶體之間頻繁的資料傳輸形成瓶頸，成為限制整體運算效能的關鍵因素之一。Flash memory 所構成的固態硬碟（SSD），已被廣泛應用於提升儲存效率，成為現代電腦系統效能優化的重要基礎。展望未來，Flash memory 或 SSD 不僅將持續優化資料存取效率，更可能取代部分 CPU 或 GPU 的運算功能，直接參與巨量資料處理，在 AI 系統中扮演更關鍵的角色。

Invited Talk IV - Thursday Afternoon (14:15 - 15:15pm)

Title: The Edge AI Compiler Journey at MediaTek

Speaker: Dr. Bor-Yeh Shen (沈柏曄)

Senior Technical Manager, MediaTek

Bio

Dr. BY Shen is a Senior Technical Manager in the Computing and Artificial Intelligence Technology Group at MediaTek, with over a decade of industry experience in compiler design and optimization. Since 2017, he has led the development of the NeuroPilot AI compiler infrastructure, which is widely used across MediaTek products to enable Edge AI capabilities. Dr. Shen holds a Ph.D. in Computer Science from National Chiao Tung University, Taiwan.

Abstract

In the rapidly evolving field of artificial intelligence, the efficient and optimized compilation of AI models is crucial for achieving high performance on edge devices. This talk will guide you through the development journey of the MediaTek AI Compiler, a key component of the NeuroPilot SDK, designed to enhance the deployment of AI applications across MediaTek's diverse product lines. We will delve into the architecture and features of the MediaTek AI Compiler, highlighting its capabilities and addressing the challenges in optimizing neural network models for various AI tasks.

Invited Talk V - Thursday Afternoon (16:30 - 17:30pm)

Title: Building a Scalable AI/ML Software Stack for RISC-V: From PyTorch to Deployment on SiFive Intelligence XM Platforms

Speaker: Mr. Hong-Rong Hsu (許宏榮)

Principal Engineer, SiFive Taiwan

Bio

Hong-Rong Hsu has been leading and actively contributing to the Open Source Software team since 2019 and is currently serving as an AI/ML Team Manager at SiFive. In this role, he is responsible for both overseeing and hands-on development of AI/ML software, as well as managing AI/ML model e2e deployment. Prior to joining SiFive, Hong-Rong gained valuable experience at MediaTek (2010–2018) and Bitmain (2018–2019). His expertise spans AI/ML, system software, compilers, and RISC-V, with a strong focus on driving innovation and contributing to technical advancements in these areas.

Abstract

This talk presents the SiFive AI/ML Software Stack for RISC-V, designed to enable efficient end-to-end deployment of AI models. By leveraging the IREE compiler infrastructure, the stack supports model lowering and hardware-aware optimization targeting the SiFive X390 cores and the on-chip AI matrix engine in the new XM series platform. We will walk through real-world deployment examples to highlight the flexibility, performance, and compiler-driven design of the stack for executing modern AI workloads on open RISC-V hardware.

Page updated

Google Sites

Report abuse

Speakers

Keynote I - Wednesday Morning (9:30 - 10:50am)

Speaker: Dr. Trent Lo

Keynote II - Thursday Morning (9:20 - 10:40am)

Speaker: Professor Jian-Jia Chen

Invited Talk I - Wednesday Afternoon (13:30 - 14:30pm)

Speaker 1: Mr. Simon TC Wang (王庭昭)

Speaker 2: Dr. Yuan-Ming Chang (張元銘)

Speaker 3: Mr. Quey-Liang Kao (高魁良)

Invited Talk II - Wednesday Afternoon (15:50 - 16:50pm)

Speaker: Professor Tsung Tai Yeh (葉宗泰)

Invited Talk III - Thursday Afternoon (13:05 - 14:05pm)

Speaker: 李祥邦 處長

Invited Talk IV - Thursday Afternoon (14:15 - 15:15pm)

Speaker: Dr. Bor-Yeh Shen (沈柏曄)

Invited Talk V - Thursday Afternoon (16:30 - 17:30pm)

Speaker: Mr. Hong-Rong Hsu (許宏榮)

Speaker: 李祥邦處長