Distinguished Engineer, NVIDIA
Assistant Professor, Carnegie Mellon University
Tianqi Chen is currently an Assistant Professor at the Machine Learning Department and Computer Science Department of Carnegie Mellon University. He is a Distinguished Engineer at NVDIIA. He received his PhD from the Paul G. Allen School of Computer Science & Engineering at the University of Washington. He has created many major learning systems that are widely adopted: XGBoost, TVM, and MLC-LLM
Building ML Systems Foundations at the age of AI
We are currently living in an exciting era for AI, where machine learning systems and infrastructure are crucial for training and deploying efficient AI models. The modern machine learning systems landscape is rich with diverse components, including compilers, libraries, DSLs, frameworks, and coding agents. In this talk, I will explore topics on how we can build a common foundation that helps interoperability across these components. We will also discuss our experience in bringing foundational models across edge and cloud through machine learning compilation. Finally, we will touch on how to build a virtuous cycle where AI can be used in the ML systems production flow.
Chair for Compiler Construction, TU Dresden
As Chair for Compiler Construction at TU Dresden, Jeronimo Castrillon works at the intersection of programming languages, compilers, and computer architecture. His group develops tools and abstractions that make complex, heterogeneous hardware accessible to developers—bridging the gap between high-level software design and efficient hardware execution.
Compilers for In-Memory Computing Systems
Fueled by exciting advances in materials and devices, in-memory computing architectures now represent a promising avenue to advance computing systems. Plenty of manual designs have already demonstrated orders of magnitude improvement in compute efficiency compared to classical Von Neumann architectures across different application domains. In this talk we discuss automation flows for programming and exploring the parameter space of in-memory architectures. We report on current efforts on building an extensible framework around the MLIR compiler infrastructure to abstract from individual technologies to foster re-use. Concretely, we present optimising flows for in-memory accelerators based on cross-bars, on content addressable memories and bulk-wise logic operations. We believe this kind of automation to be key to more quickly navigate the heterogeneous landscape of in-memory accelerators and to bring the benefits of emerging architectures to a boarder range of applications.
Freelance Software Developer & Mojo Champion
Maxim Zaks is a freelance software developer and Mojo Champion contributing to the Mojo standard library and core ecosystem. He authors language enhancement proposals, explores compiler and performance optimizations, and actively supports the Mojo developer community. Maxim regularly speaks at technical meetups and conferences, sharing insights on language design, systems programming, and high-performance computing with Mojo.
Solving the Multi-Platform Problem with Mojo
AI workloads push programming languages to their limits: developers need low-level control for performance, high-level ergonomics for productivity, and seamless portability across heterogeneous platforms. Mojo is a new systems programming language designed to address these challenges by combining Python interoperability with modern compilation techniques. In this talk, we’ll dive into how Mojo enables multi-platform targeting through conditional compilation, and how MLIR ops can be embedded directly into libraries to unlock performance-critical paths. I’ll illustrate these ideas with concrete examples from the Mojo standard library and the MAX open-source codebase, showing how Mojo helps unify the fragmented AI software stack.
Professor of Computer Architecture, TU Wien
He is a Full Professor of Computer Architecture at the Institute of Computer Engineering, TU Wien Informatics. Before joining TU Wien, he led a research group at the Chair of Electronic Design Automation at TU Munich.
His research focuses on Electronic System Level (ESL) design, RISC-V domain-specific architectures, tinyML and embedded ML compiler toolchains, as well as functional safety and hardware security. He is a Senior Member of IEEE and an active contributor to the RISC-V community.
Graph-Level Tiling, Operator Patching, and Fusing for Distributed, Memory-Optimized, and Fault-Tolerant TinyML Deployment
A new generation of AI-enhanced microcontrollers now delivers performance in the hundreds of GOPS, but their inherently low-cost, low-power design still limits on-chip SRAM and ROM to just a few megabytes. As a result, memory capacity remains a central challenge when deploying TinyML models onto these devices. Several techniques—such as pruning and compression—have been introduced to reduce peak memory consumption, and operator tiling and fusion have proven particularly effective for generating memory-aware buffer layouts and execution schedules.
Beyond single-device optimization, tiling and fusing can also be leveraged to aggregate memory across multiple microcontrollers in distributed inference settings. Furthermore, operator patching using checksum-based methods enables modification of the dataflow graph for fault-tolerant execution.
In this talk, we present an overview of graph-level tiling and fusion techniques and their roles in distributed, memory-optimized, and fault-tolerant TinyML deployment. We also introduce an ONNX-based library that integrates these methods directly at the dataflow-graph level, simplifying their adoption in practical toolchains.
CTO, Roofline.ai
As CTO of Roofline.ai, Maximilian Bartel drives innovation in AI performance engineering. His work brings together deep insights from compilers, hardware, and AI to help developers understand and optimize the efficiency of modern machine learning workloads.
Professor, INSA Hauts-de-France and CNRS
Prof. Smail Niar, INSA Hauts-de-France/Université Polytechnique Hauts-de-France (UPHF) & CNRS, received his PhD in computer Engineering from the University of Lille (France) in 1990. Since then, he has been professor at UPHF and INSA Hauts-de-France. He is member of the computer science department at the “Laboratory of Automation, Mechanical and Computer Engineering”, a joint research unit between CNRS and UPHF/INSA. His research interests are AI/ML-based embedded systems, autonomous transportation systems, HPC, and edge computing.
Hardware-Aware AI: Bridging Model Design, Compilers, and Edge Deployment
Integrating deep learning (DL) on resource-constrained edge devices requires hardware-aware and highly efficient solutions. This is particularly challenging due to the high computational and memory cost of standard convolution layers in modern Convolutional Neural Networks (CNNs). In my talk, I will present state-of-the-art approaches for efficient DL deployment using Hardware-Aware Neural Architecture Search (HW-NAS), extended with compiler-integrated convolution-level co-optimization. The talk will focus on three complementary strategies:
1. Surrogate Models and ML4ML for Fast Exploration
2. Model Compression and Dynamic NAS
3. Compiler-Integrated Convolution Search (CONAS)
Member of Technical Staff, Fractile
Eclipse Aidge