CXL is an emerging technology that proposes a way to interconnect CPU and devices through coherent memory. It is already available in recent CPUs, and the first devices are about to hit the market. Unfortunately, much of the material available about CXL presents it either as a very high-level protocol---by classifying devices as memory providers, memory caching, or both---or at a very low level, describing packet formats and the low-level message exchanges that the protocol entails. Neither option presents to researchers wanting to enter the field the technology's true potential nor how to utilize it to unlock a potentially new way to build memory-intensive systems.
In this tutorial, we attempt to cover this gap. We first approach CXL from an architectural point of view and start discussing the new platform scenarios it opens. We then revisit cache coherence technology but using a unique angle: We explore a new way of reasoning about the coherent machinery as a means to move data across the cache/memory hierarchy.
This approach introduces a new perspective on data structures and how they can span this hierarchy and allows for reasoning about aspects such as concurrency, fault tolerance and correctness in a much easier way.
Only after the attendance understands these new possibilities that CXL opens do we dive into more detail on its underlying mechanisms. We do so by using actual, interactive observations of how coherence traffic occurs between a CPU and real CXL devices with the help of a protocol analyzer that captures the invalidation message traffic between them.
This topic is so rich and relevant for performance that we will present it through exercises and labs, providing ample opportunity for discussion.
We close the tutorial by presenting the development opportunities available today for researchers interested in getting into this field and discussing several open research directions.