Emerging HPC and data center applications, in domains such as artificial intelligence (AI), data analytics, scientific computing, enterprise computing etc., are experiencing rapid growth regarding the amount of data to be processed combined with algorithm complexity. As a promising solution, industry and academia are moving toward more and more heterogeneous architectures based on various accelerators in many different parts of hardware infrastructures. This creates opportunities for the flexible accelerators to find a new path to provide a significantly better performance per watt than other solutions.
One of the main goals of this workshop is to better understand present, and future challenges for FPGA/xPU accelerator devices, and at the same time create awareness and engagement around some of the initiatives led by professionals from academia and the industry. That will allow collaborative activities between academia and industry, and the creation of placeholders for ongoing international research projects, and follow-up their evolution along the timesharing knowledge.
The workshop will include technical presentations to develop a complete view of the ecosystem, from software to hardware, and build on top of it the next generation of HPC and data center systems.
More heterogeneity is required, furthering the current strong focus on DPUs, GPUs (mainstream), and application specific-accelerators such as TPUs, etc.
Proven chiplets/2.5D/3D packaging technologies open the door to more flexible heterogeneous integration into mainstream SoCs and super-chips.
Algorithmic innovations are happening at a fast pace such that it becomes challenging to balance specialization versus generality as well as fixed versus reconfigurable hardware designs.
The purpose of this workshop is to provide an overview of the advancements and challenges in the HPC and data center domains by using FPGA/xPU accelerators, taking into account considerations and points of view from academia and industry. As part of this, and due to the inherent complexity of managing these devices reducing their accessibility significantly compared to other kinds of accelerators, this workshop aims to tackle the needs of relevant stakeholders and ecosystems to reverse, or at least alleviate this situation.
This workshop offers a forum for researchers and developers to discuss how all the different pieces of the ecosystem impacts on spreading and popularizing the use of the FPGA/xPU accelerators as solutions for traditional and emerging HPC and datacenter applications, including applications, programming models, and toolchains among others.
The topics of interest for this workshop include, but it is not constrained to the following:
Advances and future trends and challenges in FPGA/xPU accelerators for HPC and data center systems
Novel methods, tools, and programming models to enhance FPGA/xPU accelerator functionalities across HPC and datacenter systems
FPGA/xPU accelerator architecture design flow for HPC and data center
Programming environments for reconfigurable systems to increase the rate of productivity
Reconfigurable computing applications and/or efficient algorithms for reconfigurable hardware
Other topics related to FPGA/xPU accelerators, reconfigurable computing and HPC
10:00-10:30 Workshop opening
Session 1: Smart NIC, FPGAs and Disaggregated Architectures
10:30-11:00 Antonio Peña (BSC): ODOS: Democratizing SmartNICs for HPC with OpenMP Offloading
Abstract: SmartNICs are yet to take off for HPC use. We have developed ODOS to enable seamless OpenMP Offloading to NVIDIA BlueField DPUs. Furthermore, ODOS features MPI integration, which enables seamlessly offloading MPI calls to the device. In this talk we well discover the benefits of SmartNICs for OpenMP offloading and how ODOS enables these features.
Short bio: Dr. Antonio J. Peña is a Leading Researcher at the Barcelona Supercomputing Center (BSC), where he leads the "Accelerators and Communications for HPC" group. He also holds an appointment as Teaching and Research Staff at Universitat Politècnica de Catalunya. Dr. Peña is a Ramon y Cajal Fellow, former Marie Sklodowska-Curie Individual Fellow and former Juan de la Cierva Fellow. He is a recipient of the 2023 Betancourt y Molina Award from the Spanish Royal Academy of Engineering, a 2017 IEEE TCHPC Award for Excellence for Early Career Researchers in High Performance Computing, and a Sr. ACM/IEEE member. He is involved in the organization and steering committees of several conferences and workshops such as SC, or IEEE Cluster, and has served in 50+ technical committees for conferences and workshops. Coauthor of 100+ indexed papers, his research interests are in the area of runtime systems, programming models, and resource heterogeneity for HPC.
11:00-11:30 Coffee break
11:30-12:00 Dirk Pleiter (Groningen University): SmartNICs for Scientific HPC: Status Review and Future Directions
Abstract: The availability of SmartNICs has raised the question for several years as to how they can be used for HPC architectures. We take the opportunity to review the current status and discuss different possibilities for using such devices in this context and to discuss opportunities and challenges. On this basis, we explore various directions for future research.
12:00-12:30 Christoph Hagleitner (IBM Zürich): Heterogeneous Computing Systems for AI & HPC: Applications & Architecture
Abstract: Many recent breakthroughs in AI and science were only possible due to the availability of ever-more powerful computing systems. While the architecture of the systems used to run “classic” HPC applications at exascale and the systems training trillion-parameter AI models is converging, the slowdown of Moore’s law is increasingly addressed through application-domain specific accelerators and system-level optimization. Furthermore, the explosion of data, the power & energy limitations, and the operational complexity (including security, data management, development environment, etc) need to be addressed. This demands a fresh look at system architecture, where data takes the center stage and defines key architectural elements in a data-driven approach. In this presentation I will discuss the implications on system architecture and present the results and learnings from the recently completed H2020 project EVEREST, which explored the use of FPGAs for Big-data and HPC applications.
12:30-13:00 Riadh Abdelhamid (Heidelberg University): Breaking the FPGA Programming Wall for Massively-Parallel Computing with BRISKI RISC-V Barrel Processor
Abstract: Despite their unmatched parallelism, energy efficiency, and reconfigurability, FPGAs remain absent from the TOP500 supercomputers. The primary barrier? The FPGA programming wall, a complex, hardware-centric development process that limits adoption in data centers. Traditional FPGA programming requires expertise in RTL design, lacks standardized software tools, and struggles with scalability for kilo-core architectures. This talk explores how leveraging RISC-V and Many-Tiny-Core paradigm can overcome this challenge. By utilizing configurability, custom instructions, and barrel processing, architectures like BRISKI (Barrel RISC-V for Kilo-Core Implementations) enable programmable, scalable, and highly parallel processing on FPGAs, bridging the gap between software and hardware development. BRISKI's lightweight, multi-threaded RISC-V cores eliminate traditional bottlenecks, such as branch prediction and register forwarding, thus, offering a high-throughput, low-resource, kilo-core scalable solution.
13:00-14:00 Lunch break
Session 2: Memory-centric Computing
14:00-14:30 Asif Ali Khan (TU Dresden): Programmability and Reliability Challenges in Memory-Centric Computing Systems
Abstract: Conventional computing systems struggle to balance power efficiency and performance, driving the adoption of specialized domain-specific accelerators and near/in-memory computing architectures. Leveraging emerging (memory) technologies and radically different computing paradigms, these systems have demonstrated significant improvements in performance and energy efficiency, particularly for memory-intensive applications like machine learning and bioinformatics. However, they introduce critical challenges in reliability and programmability. Existing reliability schemes and programming frameworks fall short in addressing the reliability issues of these systems and improving their accessibility. This talk will explore how high-level compiler frameworks can bridge this gap, making these novel architectures more accessible to non-expert users. With a focus on near- and in-memory computing, it will demonstrate how integrating reliability as a primary optimization metric enables these frameworks to simultaneously optimize for performance, energy efficiency, and reliability.
14:30-15:00 Hu Chen (HiSilicon): System architecture analysis of near data processing (NDP) for large language model (LLM)
Abstract: Large language models (LLM) and text-to-video models have all converged to transformer models in recent years, and transformer models have become the major target workload for AI chip architecture design nowadays. The self-attention stage of transformer models is typically memory-intensive, and conventional GPUs and NPUs have rather low utilization due to the bottleneck of memory bandwidth. In comparison, near data processing (NDP), with higher memory bandwidth, would be a promising candidate solution for the self-attention computation. In this talk, the speaker is going to show the workload analysis, compare existing NDP solutions, and shares his opinion on the outlook of NDP.
Bio: Dr. Hu Chen works as a technology planning expert for HiSilicon, Huawei. And his major research interest lies in low-cost computing architectures for AI inference. Hu conducted his doctoral studies at Technical University of Munich and received his PhD in 2012. Since then, he has been working in the chip industry for 12 years, with experience on automotive chips, smartphone chips, smart TV chips and AI chips, and ended up with more than 10 patents. After joining HiSilicon in 2022, he has been studying the trend of AI architecture evolvement. Moreover, he works a lot with top universities in Europe to encourage innovations of AI architecture in the European academic community.
15:00-15:30 Gagandeep Singh (AMD Zürich): Unlocking Efficient Genomic Sequencing
Abstract: The explosive growth of data in healthcare and life sciences is pushing computational systems to their limits, leading to significant challenges in efficiency and energy usage. To meet these challenges, we must rethink how and where computation happens. In this talk, we will explore opportunities to reduce data movement and improve performance by leveraging emerging paradigms such as data-centric computing and domain-aware precision formats. I will introduce RUBICON, a framework for designing efficient deep learning-based genomic basecallers demonstrating the power of mixed-precision computation in minimizing data transfers while accelerating genomic basecalling. By combining efficient data handling strategies with scalable AI models, our work exemplifies how hardware-software co-design can unlock breakthroughs in genomic analysis.
15:30-16:00 Coffee break
Session 3: Discussion and Panel
16:00-17:30 Diskussion
17:30 Workshop closing
The targeted audience is anyone interested in the current efforts carried out world-wide on the FPGA/xPU accelerators and heterogeneous solutions in the context of HPC and/or data center. More particularly, this workshop is of interest to HPC, cloud, edge, and data center communities, with software and/or hardware background.
Teresa Cervero (BSC) - teresa.cervero(at)bsc.es
Holger Froening (U. Heidelberg) - holger.froening(at)ziti.uni-heidelberg.de
Dirk Pleiter (KTH, Sweden) - pleiter(at)kth.se
Min Li (Huawei Research Europe) - minli2(at)huawei.com