CXL SIG


This group includes external (industry) participants.

[12/13/2022] Notes and slides from our Industry Panel on Memory Disaggregation held Nov 16, 2022 are now online.

CXL SIG Google Drive folder

CXL SIG Google Drive folder

Feb 6 (Monday) 1 pm

Yiwei will discuss the rooflines from one of the SC22 presentations continuing her talk.

Then, we will talk about Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications  by Sim, et al which appeared in IEEE Computer Architecture Letters Vol 22 No 1 (2023). It addresses how best to combine near-data processing and memory interleaving by architecting a simple load balancer behind low-bandwidth CXL links to have the best of both data processing bandwidth and performance/Watt, in the context of k-Nearest Neighbor as the representative memory-intensive workload.

Jan 30 (Monday) 1 pm

Yiwei will further talk about the CXL booth of Supercomputing 2022. 

Jan 23 (Monday) 1 pm

Agenda is to do quick update of CXL news and a quick roundtable to hear suggestions about talks people want to present and papers they want to be discussed this quarter (remaining 7 meetings).

Jan 16 (Monday) at 1pm

Graduate student Yiwei Yang (advised by Andrew Quinn) will discuss the design of his CXL Memory simulator and his learnings at Supercomputing 2022.

Jan 10 (Tues) Antonio Barbalace talk at 1PM (Zoom link above)

TITLE

Rethinking Systems Software for Emerging Data Center Hardware     

ABSTRACT

Today’s data center hardware is increasingly heterogeneous, including several special-purpose and reconfigurable accelerators that sit along with the central processing unit (CPU). Emerging platforms include also heterogeneous memory – directly attached, NUMA, and over peripheral bus. Furthermore, processing units (CPUs and/or accelerators), pop-up in storage devices, network cards, and along the memory hierarchies (near data processing architectures). Therefore, introducing hardware topologies that didn’t exist before!

Existent, traditional, systems software has been designed and developed with the assumption that a single computer hosts a single CPU complex with direct attached memory, or NUMA. Therefore, there is one operating system running per computer, and software is compiled to run on a specific CPU complex. However, within emerging platforms this doesn’t apply anymore because every different processing unit requires its own operating system and applications, which are not compatible between each other, making a single platform look like a distributed system – even when CPU complexes are tightly coupled. This makes programming hard and hinders all of a set of performance optimizations. Therefore, this talk argues that new systems software is needed to better support emerging non-traditional hardware topologies, and introduces new operating system and compiler design(s) to achieve easier programming, and full system performance exploitation.

BIO

Antonio Barbalace is a Senior Lecturer (Associate Professor) at the School of Informatics of the University of Edinburgh, Scotland. Before, he was an Assistant Professor in the Computer Science Department, at Stevens Institute of Technology, New Jersey. Prior to that, he was a Principal Research Scientist and Manager at Huawei, German Research Center, based in Munich, Germany. He was a Research Assistant Professor, and before a Postdoc, at the ECE Department, Virginia Tech, Virginia. He earned a PhD in Industrial Engineering from the University of Padova, Italy, and an MS and BS in Computer Engineering from the same University.

Antonio Barbalace’s research interests include all aspects of system software, embracing hypervisors, operating systems, runtime libraries, and compilers/linkers, for emerging highly-parallel and heterogeneous computer architectures, including near data processing platforms and new generation interconnects with coherent shared memory. His research seeks answers about how to architect or re-architect the entire software stack to ease programmability, portability, enable improved performance and energy efficiency, determinism, fault tolerance, and security. His research work appeared at top systems venues including EuroSys, ASPLOS, VEE, ICDCS, Middleware, EMSOFT, HotOS, HotPower, and OLS.

WEBSITE

http://www.barbalace.it/antonio/



CXL SIG celebrates Daniel's graduation. Congratulations, Dr. Bittman!

Rare Talent. Dissertation Award level work. Foundational. Bold. Superatives just flew in the closed door session of the committee. I have watched Daniel take an idea and commit to it. He has been an inspiration to his fellow grad students. And has done justice to the often ignored "Ph." part of the Ph.D. degree.

Daniel's contribution to the rapidly evolving world of memory has been recognized at prestigious Usenix ATC in 2020 with a Best Presentation award. But I have realized the importance of his work in action as I work closely with major SaaS analytics vendors, major semiconductor memory suppliers,  and world's leading virtualization researchers.

As "memoryness" spreads beyond RDMA in space through disaggregation and in time through persistent memory, the need to rescue translation contexts from process abstraction has become paramount. Daniel has done that by placing a foreign object table in every memory object and done for memory what S3 did for storage. We at Elephance -- where Daniel is a cofounder -- believe in this idea deeply and are committed to bring it to the world of CXL, working closely with our customers and partners in industry and government and with our growing network of academic and research collaborators.

Pankaj Mehra, President and CEO
Elephance  Memory, Inc.

Current and Upcoming Topics

Learnings from Existing Disaggregated Memory Systems

Workloads

Microarchitecture Co-evolution with CXL

Linux HMM scope, co-evolution with CXL, and limitations

On Sep 27, 2022 Yiwei updated with details showing where jemalloc and libnuma hook into SMDK. We need to run this by Samsung collaborators for accuracy and to better understand their roadmap (link to Yiwei's SMDK deep dive presentation)

Past Topics

Why CXL? Will it make sense despite added latency and cost in the memory access path relative to DDR DRAM? 

Document WIP