ML Compilation Infrastructure Workshop
Reception and breakfast from 8:45am to 10am
10am
Hugo Pompougnac
10am to 10:40 am
Saday Sadayappan (U. of Utah)
A fundamental challenge in various aspects of performance optimization by compilers is the development of effective performance models. Often, the space of alternate transformed code versions is explosively large and determining which of those would perform best is very challenging because of the difficulty of developing accurate analytical performance models. Therefore there has been interest in using Machine Learning for performance modeling in optimizing compilers. This talk will discuss some current directions being pursued.
10:40am to 11am
Kunwar Grover (AMD)
Transformer based models have received unprecedented attention lately. One of the main bottlenecks for these models is the Attention layer, and can sometimes decide the entire execution time of these models. A number of recent research papers, have focused on different algorithms to codegen different Attention layer variants efficiently. The aim of this talk is to show how most of these proposed algorithms can be thought of as kernel fusions or known matrix multiplication optimizations. The talk will derive a base implementation for FlashAttention and use existing matrix multiplication like split-k as a model to derive different algorithms for attention variants. The talk will also cover how the MLIR-based compiler, IREE, codegens these kernels.
Coffee break
11:20am to noon
Jorn Tuyls (AMD)
This workshop talk presents code generation for the NPU in Ryzen AI laptops and focuses mainly on the data tiling and data packing aspects of it. It shows how data tiling maps to the NPU's cores and caches and how it can describe granular data movement patterns through the NPU's data streams.
Noon to 12:40
Sylvain Noiry (INRIA CORSE/Kalray)
There are many reasons in favor of a code generation from MLIR to standard programming languages like C. In the context of offloading, it can be necessary to handle the host and the accelerator code in different manners, with different compilers. Accelerator-specific compilers are often not up-to-date nor even based on llvm-project.
The use case will be the Kalray MPPA accelerator, which comes with a low-level offloading library that can be exposed in MLIR. The kernels are compiled using a GCC based compiler, and can rely on hardware-specific micro-kernels or builtins to fully exploit the machine. This presentation will focus on major changes from the MLIR to C code generator already present in MLIR-AIE, to use the Kalray accelerator. This includes differentiation between the host related and the offloaded parts from a single MLIR input, code generation of hardware specific operations, and a discussion on the tradeoff between transparency and complexity.
12:40 to 1pm
Sasha Lopoukhine (U. of Cambridge)
Lunch and informal discussions from 1pm to 2:30pm
2:30pm to 3:10pm
Olivier Bichler (CEA)
Aidge is a generic, multi-paradigms compute graph manipulation, quantization, mapping, scheduling and code generation tool. Its primary targets are embedded systems with specialized hardware accelerators, especially dataflow or restricted instruction set architectures. It is highly interoperable thanks to built-in ONNX import/export and direct PyTorch interface, and its modularity allows to use any of its features in standalone or in conjunction with other tools along the deployment path of a model to the embedded system. In this presentation, we will go over some of the main differentiating features of the framework and the contexts in which they may shine!
3:10pm to 3:50pm
Christophe Guillon (INRIA CORSE)
Automatic tuning of compiler optimizations spans various levels of abstraction and multiple fields of computer science. Programming the underlying sofware components demands not only expertise in each area but also a deep understanding of compiler internals. As a result, fully leveraging these compilers can be challenging, often giving the impression of dealing with a black box.
In this talk, we present several ideas aimed at enhancing the ergonomics and modularity of these compilers in the fields of search space definition, statistical model construction, transformation language definition (and its integration with the code generator).
The talk will conclude with a demonstration within the context of the Aidge framework.
3:50pm to 4:10pm
Mathieu Fehr (U. of Edinburgh)
MLIR is designed to be modular and extensible, allowing for the definition of custom IRs. However, MLIR is primarily focused on syntax, and does not provide a way to define the semantics of operations in a formal way. This makes it difficult to reason about the correctness of transformations and analyses, and is a barrier to the development of formal verification tools for MLIR -based compilers. In this talk, we will introduce a set of semantics dialects, based on SMT -LIB, which allows to define the semantics of MLIR dialects as a compiler transformation. We will show how we can use these semantics dialects to give semantics of core MLIR dialects, such as arith, comb, and memref, and how we can use this new abstraction to define formal verification tooling such as a translation validation tool, a peephole rewrite verifier and synthetizer, and a dataflow analysis verifier.
Coffee break
4:30pm to 5:10pm
Dumitru Potop-Butucaru (INRIA KAIROS)
ML programming still involves today 3 different levels using different formalisms and practices: that of layers, that of models, and that of driver logic controlling the inference, training, or reinforcement learning process. Layers are typically linear algebra kernels, subject to classical HPC development methods. Layers and parameters are then assembled into models using increasingly complex control involving in current practice dataflow graphs, stateful behaviors, and conditional control. This is done using ML-specific formalisms such as jax or pytorch. Drivers are typically programmed in Python, and sometimes manually re-implemented in C/C++ for efficiency and/or embedding into larger applications. They take advantage of complex code transformations of the model, such as automatic differentiation, the synthesis of the parameter update code, or just-in-time compilation.
This split into 3 levels using different formalisms (and the use of little-formalized Python code) poses semantic problems, as one model will often behave differently on different architectures and under a different driver. But maybe more importantly, it limits automation and productivity. Full automation covers today the compilation of layers and models after transformations such as automatic differentiation and parameter update synthesis, as well as the automatic differentiation of stateless functions. However, the parameter update code, subject to subtle choices and working on a monolithic state representation, and the code interacting with the sample database or the I/Os must be hand-written.
In this presentation we will show that (dataflow) reactive languages provides a more natural programming paradigm encompassing layer, model, driver, and also the I/O processing and possibly the low-level scheduling of operations.
Reactive control primitives can be included as a dialect inside MLIR and then used jointly with tensor functions to allow modular, hierarchic, and stateful specification covering all three levels: layer, model and driver. The specification can then be seamlessly compiled into efficient code. Seen as a high-level specification language, the same reactive primitives support modular automatic differentiation and fully automatic synthesis of the parameter update code without the need to expose parameters at top level.
We would like to explore how these concepts can be further extended, leveraging generic MLIR support for AD, code generation, and framework integration such as Enzyme-JAX.
5:10pm to 6:10pm
Each participant, one minute each
Dinner
MLIR Winter School, Day 1
9am to 10am
Mehdi Amini
Coffee break
10:30am to noon
Mathieu Fehr & Sasha Lopoukhine
This session will include a hands-on tutorial to learn how to interact with xDSL to optimize and compile programs. While this tutorial will be focused on xDSL, the concepts and tools presented will be applicable to MLIR as well. We will showcase a few MLIR IR files, and show how to interpret them using the xdsl-opt tool. We will then present how to build a pipeline of passes to optimize these files and to generate low-level code.
Lunch and informal discussions from noon to 2 pm
2 pm to 3:30pmÂ
Mathieu Fehr & Sasha Lopoukhine
This session will be a hands-on tutorial on how to write new dialects and passes in xDSL. In particular, we will present how a pass composed of simple peephole rewrites (local optimizations) can be written in xDSL. We will task the participants with extending a high-level dialect with a new operation, and to then extend an optimization and transformation to a low-level dialect to support that new operation. While these tasks will be done in xDSL for simplicity, the concepts will be applicable to MLIR as well.
Coffee break
4pm to 5pm
Dinner at 7 pm
MLIR Winter School, Day 2
9am to 10am
Coffee break
10:30am to noon
Mehdi Amini
Lunch and informal discussions from noon to 2 pm
2pm to 3:30pm
William Moses
Automatic differentiation (AD) is key to training neural networks, Bayesian inference, and scientific computing. Applying these techniques requires rewriting code in a specific machine learning framework or manually providing derivatives. This talk presents Enzyme, a high-performance automatic differentiation compiler plugin for the LLVM and MLIR compiler frameworks. Enzyme differentiates programs in any language whose compiler targets LLVM/MLIR, including C/C++, Fortran, Julia, Rust, Swift, JaX, etc., thereby providing native AD capabilities in these languages with state-of-the-art performance. Unlike traditional tools, Enzyme performs AD on optimized IR. On a combined machine-learning and scientific computing benchmark suite, AD on optimized IR achieves a geometric mean speedup of 4.2x over AD on IR before optimization.
This talk will also include work that makes Enzyme the first fully automatic reverse-mode AD tool to generate gradients of existing GPU kernels as well as the benefits of operating within high-level structured representations, like MLIR.
Coffee break
4pm to 5:30pm
Marius Brehler
Dinner at 7 pm
MLIR Winter School, Day 3
9am to 10:30 am
Alex Zinenko
Native high-level code generation support in MLIR is largely based on the idea of structured code generation, which is often mistaken for being synonymous with the linear algebra (Linalg) dialect. Instead, the structured code generation approach evolved hand-in-hand with the progressive lowering philosophy of MLIR and permeates most of its dialects involved in code generation. This talk attempts to demystify the structured code generation in MLIR by introducing the relevant concepts bottom-up from individual arithmetic operations on scalars, to single instruction multiple data (SIMD) operations on vectors, to manipulations on multi-dimensional tensors. Using small examples and illustrations, it demonstrates that this approach boils down to a handful of concepts largely present in modern hardware though with a slightly different terminology. It does not require a deep understanding of MLIR or any specific dialect.
Coffee break
11am to 12:30
Alex Zinenko
MLIR, like the rest of LLVM, is primarily written in C++. However, the C++ API is known to be complex and unstable. Moreover, both quick prototyping and deep integration with client frameworks calls for uses of different languages to work with MLIR, most often Python for its simplicity and C for its ubiquity. This talk will present the MLIR C API and demonstrate how it is used to construct Python bindings. Attendees of this talk will learn how to expose custom dialects in both C and Python as well as how to leverage C API to interact with MLIR from different languages.
Lunch and informal discussions from 12:30 to 2 pm
2pm to 3:30pm
Alex Zinenko
MLIR features support for declaratively specifying and controlling compiler transformations via the transform dialect. It allows one to request compiler transformations using compiler IR itself, which can be embedded into the original IR that is being transformed (similarly to pragmas) or supplied separately (similarly to scheduling languages). This talk presents the concepts of the MLIR transform dialect and related infrastructure. It will be accompanied by a practical demonstration of three use scenarios. After following the task, the attendees will be able to apply the transform dialect in their work and extend it when necessary. Basic familiarity with MLIR is a prerequisite.
Coffee break
4pm to 5:30 pm
Matthias Springer
Pattern-based IR rewriting through the greedy pattern rewriter and the dialect conversion framework is widely used and one of the core mechanisms of MLIR. This session is a hands-on introduction into the pattern API and the pattern drivers, along with some best practices that programmers can follow when designing pattern-based rewrites. Topics that will be covered include: rewrite pattern API, greedy pattern rewrite driver, walk pattern driver, conversion pattern API, type converter API, dialect conversion, 1:N conversions, declarative pattern definition with PDL, canonicalizer pass, transform dialect integration, debugging strategies for pattern-based rewrites.
Dinner at 7 pm
MLIR Winter School, Day 4
9am to 10:30 am
Mathieu Fehr
After learning the day before how to define operations, rewrites, and passes in xDSL, we will now present how to transfer this knowledge to MLIR. We will present how an MLIR dialect is structured in C++ and TableGen (a metaprogramming tool used by MLIR), and how to define new operations, attributes, and types. We will also present how to write new passes in C++, and how to write peephole rewrites using the MLIR pattern rewrite infrastructure.
Coffee break
11am to 12:30
Lorenzo Chelini
The MLIR ecosystem has been rapidly evolving, offering powerful abstractions for building domain-specific compilers and optimizing intermediate representations. However, it currently lacks a robust C and C++ frontend. While Clang provides excellent support for targeting LLVM IR, targeting MLIR directly from C and C++ opens up new opportunities for innovation. This talk will introduce Polygeist and demonstrate how it bridges the gap between C or C++ and MLIR, enabling better integration with higher-level abstractions, preserving high-level semantics such as structured control flow and parallelism (e.g., OpenMP/GPU), and supporting the lowering or raising of constructs to user-defined custom operations. Attendees will gain valuable insights into how to use Polygeist and learn about ongoing research directions in this area.
Lunch and informal discussions from 12:30 to 2 pm
2pm to 3:30pm
Sasha Lopoukhine
MLIR's design makes it easy to extend existing compiler pipelines with custom transformations and abstractions. Most existing MLIR-based compilers lower their code via LLVM, benefitting from extensive compiler infrastructure. However, LLVM's backends may be optimised for best general performance, and may not be suitable for scenarios where precise control and extensibility are desired. This workshop covers an alternative flow, leveraging assembly dialects in MLIR to output assembly for linear algebra micro-kernels, with a mix of standard ISAs as well as custom extensions.
Coffee break
4pm to 5:30pm
Kunwar Grover
5:30pm
Fabrice Rastello
Â