Program

Live stream (from Monday to Friday): https://meet.google.com/whz-qxdt-way

ML Compilation Infrastructure Workshop

January 27 (at Google Paris, 50 rue d'Amsterdam)

Reception and breakfast from 8:45am to 10am

10am

Opening word

Hugo Pompougnac

10am to 10:40 am

Machine Learning and Compiler Optimization

Saday Sadayappan (U. of Utah)

A fundamental challenge in various aspects of performance optimization by compilers is the development of effective performance models. Often, the space of alternate transformed code versions is explosively large and determining which of those would perform best is very challenging because of the difficulty of developing accurate analytical performance models. Therefore there has been interest in using Machine Learning for performance modeling in optimizing compilers. This talk will discuss some current directions being pursued.

10:40am to 11am

Demistifying Different Attention Variants

Kunwar Grover (AMD)

Transformer based models have received unprecedented attention lately. One of the main bottlenecks for these models is the Attention layer, and can sometimes decide the entire execution time of these models. A number of recent research papers, have focused on different algorithms to codegen different Attention layer variants efficiently. The aim of this talk is to show how most of these proposed algorithms can be thought of as kernel fusions or known matrix multiplication optimizations. The talk will derive a base implementation for FlashAttention and use existing matrix multiplication like split-k as a model to derive different algorithms for attention variants. The talk will also cover how the MLIR-based compiler, IREE, codegens these kernels.

Coffee break

11:20am to noon

MLIR-based Data Tiling Design for Ryzen AI

Jorn Tuyls (AMD)

This workshop talk presents code generation for the NPU in Ryzen AI laptops and focuses mainly on the data tiling and data packing aspects of it. It shows how data tiling maps to the NPU's cores and caches and how it can describe granular data movement patterns through the NPU's data streams.

Noon to 12:40

Translate MLIR into efficient C code for both an Host and an Accelerator

Sylvain Noiry (INRIA CORSE/Kalray)

There are many reasons in favor of a code generation from MLIR to standard programming languages like C. In the context of offloading, it can be necessary to handle the host and the accelerator code in different manners, with different compilers. Accelerator-specific compilers are often not up-to-date nor even based on llvm-project.

The use case will be the Kalray MPPA accelerator, which comes with a low-level offloading library that can be exposed in MLIR. The kernels are compiled using a GCC based compiler, and can rely on hardware-specific micro-kernels or builtins to fully exploit the machine. This presentation will focus on major changes from the MLIR to C code generator already present in MLIR-AIE, to use the Kalray accelerator. This includes differentiation between the host related and the offloaded parts from a single MLIR input, code generation of hardware specific operations, and a discussion on the tradeoff between transparency and complexity.

12:40 to 1pm

Hardware-Informed Domain-Specific Transformations

Sasha Lopoukhine (U. of Cambridge)

Lunch and informal discussions from 1pm to 2:30pm

2:30pm to 3:10pm

Towards AI model and architecture co-optimization for ultra-specialized hardware generation with Aidge

Olivier Bichler (CEA)

Aidge is a generic, multi-paradigms compute graph manipulation, quantization, mapping, scheduling and code generation tool. Its primary targets are embedded systems with specialized hardware accelerators, especially dataflow or restricted instruction set architectures. It is highly interoperable thanks to built-in ONNX import/export and direct PyTorch interface, and its modularity allows to use any of its features in standalone or in conjunction with other tools along the deployment path of a model to the embedded system. In this presentation, we will go over some of the main differentiating features of the framework and the contexts in which they may shine!

3:10pm to 3:50pm

For Transparent and Modular Compiler Autotuning

Christophe Guillon (INRIA CORSE)

Automatic tuning of compiler optimizations spans various levels of abstraction and multiple fields of computer science. Programming the underlying sofware components demands not only expertise in each area but also a deep understanding of compiler internals. As a result, fully leveraging these compilers can be challenging, often giving the impression of dealing with a black box.

In this talk, we present several ideas aimed at enhancing the ergonomics and modularity of these compilers in the fields of search space definition, statistical model construction, transformation language definition (and its integration with the code generator).

The talk will conclude with a demonstration within the context of the Aidge framework.

3:50pm to 4:10pm

Formal Semantics as MLIR Dialects

Mathieu Fehr (U. of Edinburgh)

MLIR is designed to be modular and extensible, allowing for the definition of custom IRs. However, MLIR is primarily focused on syntax, and does not provide a way to define the semantics of operations in a formal way. This makes it difficult to reason about the correctness of transformations and analyses, and is a barrier to the development of formal verification tools for MLIR -based compilers. In this talk, we will introduce a set of semantics dialects, based on SMT -LIB, which allows to define the semantics of MLIR dialects as a compiler transformation. We will show how we can use these semantics dialects to give semantics of core MLIR dialects, such as arith, comb, and memref, and how we can use this new abstraction to define formal verification tooling such as a translation validation tool, a peephole rewrite verifier and synthetizer, and a dataflow analysis verifier.

Coffee break

4:30pm to 5:10pm

Reactive programming is all you need

Dumitru Potop-Butucaru (INRIA KAIROS)

ML programming still involves today 3 different levels using different formalisms and practices: that of layers, that of models, and that of driver logic controlling the inference, training, or reinforcement learning process. Layers are typically linear algebra kernels, subject to classical HPC development methods. Layers and parameters are then assembled into models using increasingly complex control involving in current practice dataflow graphs, stateful behaviors, and conditional control. This is done using ML-specific formalisms such as jax or pytorch. Drivers are typically programmed in Python, and sometimes manually re-implemented in C/C++ for efficiency and/or embedding into larger applications. They take advantage of complex code transformations of the model, such as automatic differentiation, the synthesis of the parameter update code, or just-in-time compilation.

This split into 3 levels using different formalisms (and the use of little-formalized Python code) poses semantic problems, as one model will often behave differently on different architectures and under a different driver. But maybe more importantly, it limits automation and productivity. Full automation covers today the compilation of layers and models after transformations such as automatic differentiation and parameter update synthesis, as well as the automatic differentiation of stateless functions. However, the parameter update code, subject to subtle choices and working on a monolithic state representation, and the code interacting with the sample database or the I/Os must be hand-written.

In this presentation we will show that (dataflow) reactive languages provides a more natural programming paradigm encompassing layer, model, driver, and also the I/O processing and possibly the low-level scheduling of operations.

Reactive control primitives can be included as a dialect inside MLIR and then used jointly with tensor functions to allow modular, hierarchic, and stateful specification covering all three levels: layer, model and driver. The specification can then be seamlessly compiled into efficient code. Seen as a high-level specification language, the same reactive primitives support modular automatic differentiation and fully automatic synthesis of the parameter update code without the need to expose parameters at top level.

We would like to explore how these concepts can be further extended, leveraging generic MLIR support for AD, code generation, and framework integration such as Enzyme-JAX.

5:10pm to 6:10pm

Community Roundtable: Why are you here ?

Each participant, one minute each

Dinner

MLIR Winter School, Day 1

January 28

9am to 10am

MLIR is SSA + Regions + Dialects

Mehdi Amini

Coffee break

10:30am to noon

Interacting with MLIR/xDSL !

Mathieu Fehr & Sasha Lopoukhine

This session will include a hands-on tutorial to learn how to interact with xDSL to optimize and compile programs. While this tutorial will be focused on xDSL, the concepts and tools presented will be applicable to MLIR as well. We will showcase a few MLIR IR files, and show how to interpret them using the xdsl-opt tool. We will then present how to build a pipeline of passes to optimize these files and to generate low-level code.

Lunch and informal discussions from noon to 2 pm

2 pm to 3:30pm

Defining dialects & rewrites with xDSL

Mathieu Fehr & Sasha Lopoukhine

This session will be a hands-on tutorial on how to write new dialects and passes in xDSL. In particular, we will present how a pass composed of simple peephole rewrites (local optimizations) can be written in xDSL. We will task the participants with extending a high-level dialect with a new operation, and to then extend an optimization and transformation to a low-level dialect to support that new operation. While these tasks will be done in xDSL for simplicity, the concepts will be applicable to MLIR as well.

Coffee break

4pm to 5pm

Install the MLIR LLVM distribution

Dinner at 7 pm

MLIR Winter School, Day 2

January 29

9am to 10am

Install the MLIR LLVM distribution: fix the last issues

Coffee break

10:30am to noon

End-to-end MLIR compilation

Mehdi Amini

Lunch and informal discussions from noon to 2 pm

2pm to 3:30pm

Automatic Differentiation in MLIR

William Moses

Automatic differentiation (AD) is key to training neural networks, Bayesian inference, and scientific computing. Applying these techniques requires rewriting code in a specific machine learning framework or manually providing derivatives. This talk presents Enzyme, a high-performance automatic differentiation compiler plugin for the LLVM and MLIR compiler frameworks. Enzyme differentiates programs in any language whose compiler targets LLVM/MLIR, including C/C++, Fortran, Julia, Rust, Swift, JaX, etc., thereby providing native AD capabilities in these languages with state-of-the-art performance. Unlike traditional tools, Enzyme performs AD on optimized IR. On a combined machine-learning and scientific computing benchmark suite, AD on optimized IR achieves a geometric mean speedup of 4.2x over AD on IR before optimization.

This talk will also include work that makes Enzyme the first fully automatic reverse-mode AD tool to generate gradients of existing GPU kernels as well as the benefits of operating within high-level structured representations, like MLIR.

Coffee break

4pm to 5:30pm

Fluent Machine Learning with torch-mlir

Marius Brehler

Dinner at 7 pm

MLIR Winter School, Day 3

January 30

9am to 10:30 am

Structured code generation

Alex Zinenko

Native high-level code generation support in MLIR is largely based on the idea of structured code generation, which is often mistaken for being synonymous with the linear algebra (Linalg) dialect. Instead, the structured code generation approach evolved hand-in-hand with the progressive lowering philosophy of MLIR and permeates most of its dialects involved in code generation. This talk attempts to demystify the structured code generation in MLIR by introducing the relevant concepts bottom-up from individual arithmetic operations on scalars, to single instruction multiple data (SIMD) operations on vectors, to manipulations on multi-dimensional tensors. Using small examples and illustrations, it demonstrates that this approach boils down to a handful of concepts largely present in modern hardware though with a slightly different terminology. It does not require a deep understanding of MLIR or any specific dialect.

Coffee break

11am to 12:30

Using MLIR from C and Python

Alex Zinenko

MLIR, like the rest of LLVM, is primarily written in C++. However, the C++ API is known to be complex and unstable. Moreover, both quick prototyping and deep integration with client frameworks calls for uses of different languages to work with MLIR, most often Python for its simplicity and C for its ubiquity. This talk will present the MLIR C API and demonstrate how it is used to construct Python bindings. Attendees of this talk will learn how to expose custom dialects in both C and Python as well as how to leverage C API to interact with MLIR from different languages.

Lunch and informal discussions from 12:30 to 2 pm

2pm to 3:30pm

Controllable transformations in MLIR (transform dialect)

Alex Zinenko

MLIR features support for declaratively specifying and controlling compiler transformations via the transform dialect. It allows one to request compiler transformations using compiler IR itself, which can be embedded into the original IR that is being transformed (similarly to pragmas) or supplied separately (similarly to scheduling languages). This talk presents the concepts of the MLIR transform dialect and related infrastructure. It will be accompanied by a practical demonstration of three use scenarios. After following the task, the attendees will be able to apply the transform dialect in their work and extend it when necessary. Basic familiarity with MLIR is a prerequisite.

Coffee break

4pm to 5:30 pm

The pattern rewrite infrastructure in MLIR

Matthias Springer

Pattern-based IR rewriting through the greedy pattern rewriter and the dialect conversion framework is widely used and one of the core mechanisms of MLIR. This session is a hands-on introduction into the pattern API and the pattern drivers, along with some best practices that programmers can follow when designing pattern-based rewrites. Topics that will be covered include: rewrite pattern API, greedy pattern rewrite driver, walk pattern driver, conversion pattern API, type converter API, dialect conversion, 1:N conversions, declarative pattern definition with PDL, canonicalizer pass, transform dialect integration, debugging strategies for pattern-based rewrites.

Dinner at 7 pm

MLIR Winter School, Day 4

January 31

9am to 10:30 am

Defining dialects: ODS/Tablegen and C++

Mathieu Fehr

After learning the day before how to define operations, rewrites, and passes in xDSL, we will now present how to transfer this knowledge to MLIR. We will present how an MLIR dialect is structured in C++ and TableGen (a metaprogramming tool used by MLIR), and how to define new operations, attributes, and types. We will also present how to write new passes in C++, and how to write peephole rewrites using the MLIR pattern rewrite infrastructure.

Coffee break

11am to 12:30

C/C++ and abstraction raising with Polygeist

Lorenzo Chelini

The MLIR ecosystem has been rapidly evolving, offering powerful abstractions for building domain-specific compilers and optimizing intermediate representations. However, it currently lacks a robust C and C++ frontend. While Clang provides excellent support for targeting LLVM IR, targeting MLIR directly from C and C++ opens up new opportunities for innovation. This talk will introduce Polygeist and demonstrate how it bridges the gap between C or C++ and MLIR, enabling better integration with higher-level abstractions, preserving high-level semantics such as structured control flow and parallelism (e.g., OpenMP/GPU), and supporting the lowering or raising of constructs to user-defined custom operations. Attendees will gain valuable insights into how to use Polygeist and learn about ongoing research directions in this area.

Lunch and informal discussions from 12:30 to 2 pm

2pm to 3:30pm

Hardware-specific lowering in MLIR

Sasha Lopoukhine

MLIR's design makes it easy to extend existing compiler pipelines with custom transformations and abstractions. Most existing MLIR-based compilers lower their code via LLVM, benefitting from extensive compiler infrastructure. However, LLVM's backends may be optimised for best general performance, and may not be suitable for scenarios where precise control and extensibility are desired. This workshop covers an alternative flow, leveraging assembly dialects in MLIR to output assembly for linear algebra micro-kernels, with a mix of standard ISAs as well as custom extensions.

Coffee break

4pm to 5:30pm

IREE, its runtime and its dialects

Kunwar Grover

5:30pm

Closing word

Fabrice Rastello

Page updated

Google Sites

Report abuse