Emerging HPC and data centre applications, in domains such as artificial intelligence (AI) based on large language models (LLMs) and transformer-based architectures, data analytics, scientific computing, enterprise computing etc., are experiencing rapid growth regarding the amount of data to be processed combined with algorithm complexity. To meet these growing demands, the industry and academia are increasingly exploring heterogeneous architectures beyond traditional CPUs and GPUs.
This includes FPGAs, specialized accelerators (xPUs), and an emerging class of architectures based on near-memory and in-memory computing (IMC) concepts, as well as other non-conventional compute paradigms. Many of these innovations are finding their way into commercial products, particularly for inference workloads where energy efficiency is paramount.
The main goal of this workshop is to better understand current and future challenges in achieving resource and energy efficiency for LLMs and AI-centric workloads. We aim to foster discussion on hardware and software co-design across diverse application domains - from data centres to the edge and HPC - while facilitating collaboration between academia and private companies. The workshop will include technical presentations to develop a complete view of the ecosystem, from software to hardware, and build on top of it the next generation of HPC and data centre systems.
The explosive growth of AI workloads, especially LLMs, is driving a shift toward compute architectures that optimize for energy and memory efficiency rather than raw FLOPS alone.
Advances in chiplet-based design, 2.5D/3D packaging, and memory-centric compute models open new frontiers in architectural specialization.Â
Inference at scale, particularly in edge and low-power settings, motivates exploration of IMC/NMC technologies (e.g., SRAM-based compute arrays, HBM with integrated logic, compute-in-flash).
There is a pressing need to align hardware innovations with the evolving software stacks that support AI, scientific computing, and data-intensive applications.
This workshop will offer a forum for discussing the advancements and challenges in resource- and energy-efficient compute architectures for LLMs, transformers, and related workloads. It aims to:
Explore how different architectural paradigms - FPGAs, xPUs, near- and in-memory computing, and other emerging models - contribute to efficiency gains.
Address both hardware and software challenges, including programming models, toolchains, and compiler support for these architectures.
Examine opportunities to specialize hardware/software solutions by application domain (e.g., edge AI, data centres, HPC) or vertical markets (e.g., automotive, personalized medicine, industrial AI).
Reduce complexity barriers that hinder the wider adoption of unconventional architectures.
The topics of interest for this workshop include, but it is not constrained to the following:
Energy- and resource-efficient architectures for LLMs, transformers, and AI inference/training workloads.
Advances in near-memory and in-memory computing (digital, analog, different memory technologies including SRAM, DRAM, RRAM, PCM, etc.).
FPGA/xPU and other reconfigurable or specialized accelerators for AI and HPC.
Architectural co-design for performance optimization and energy reduction.
Hardware-software co-design: programming models, toolchains, compiler flows.
Case studies on domain specialization (edge AI, data centre, HPC, or specific verticals like healthcare, automotive).
System-level design for composable, heterogeneous infrastructures (including chiplets, 3D integration, and disaggregated compute).
Evaluation methods for energy efficiency, memory bottlenecks, and scalability in AI workloads.
TBA
Holger Froening (U. Heidelberg, Germany) - froening(at)uni-heidelberg.de
Teresa Cervero (BSC, Spain) - teresa.cervero(at)bsc.es
Dirk Pleiter (U. Groningen, Netherlands) - d.h.pleiter(at)rug.nl
Min Li (Huawei Research Europe) - minli2(at)huawei.com