F4HD 2024: HiPEAC Workshop on FPGA/xPU Accelerators for Future HPC and Datacenter
Collocated with HiPEAC Conference 2024, Jan 18, 2024, 10:00-17:30, München, Germany
Emerging HPC and data center applications, in domains such as artificial intelligence (AI), data analytics, scientific computing, enterprise computing etc., are experiencing rapid growth regarding the amount of data to be processed combined with algorithm complexity. As a promising solution, industry and academia are moving toward more and more heterogeneous architectures based on various accelerators in many different parts of hardware infrastructures. This creates opportunities for the flexible accelerators to find a new path to provide a significantly better performance per watt than other solutions.
One of the main goals of this workshop is to better understand present, and future challenges for FPGA/xPU accelerator devices, and at the same time create awareness and engagement around some of the initiatives led by professionals from academia and the industry. That will allow collaborative activities between academia and industry, and the creation of placeholders for ongoing international research projects, and follow-up their evolution along the timesharing knowledge.
The workshop will include technical presentations to develop a complete view of the ecosystem, from software to hardware, and build on top of it the next generation of HPC and data center systems.
Motivation
More heterogeneity is required, furthering the current strong focus on DPUs, GPUs (mainstream), and application specific-accelerators such as TPUs, etc.
Proven chiplets/2.5D/3D packaging technologies open the door to more flexible heterogeneous integration into mainstream SoCs and super-chips.
Algorithmic innovations are happening at a fast pace such that it becomes challenging to balance specialization versus generality as well as fixed versus reconfigurable hardware designs.
Scope and objectives
The purpose of this workshop is to provide an overview of the advancements and challenges in the HPC and data center domains by using FPGA/xPU accelerators, taking into account considerations and points of view from academia and industry. As part of this, and due to the inherent complexity of managing these devices reducing their accessibility significantly compared to other kinds of accelerators, this workshop aims to tackle the needs of relevant stakeholders and ecosystems to reverse, or at least alleviate this situation.
This workshop offers a forum for researchers and developers to discuss how all the different pieces of the ecosystem impacts on spreading and popularizing the use of the FPGA/xPU accelerators as solutions for traditional and emerging HPC and datacenter applications, including applications, programming models, and toolchains among others.
Topics of Interest
The topics of interest for this workshop include, but it is not constrained to the following:
Advances and future trends and challenges in FPGA/xPU accelerators for HPC and data center systems
Novel methods, tools, and programming models to enhance FPGA/xPU accelerator functionalities across HPC and datacenter systems
FPGA/xPU accelerator architecture design flow for HPC and data center
Programming environments for reconfigurable systems to increase the rate of productivity
Reconfigurable computing applications and/or efficient algorithms for reconfigurable hardware
Other topics related to FPGA/xPU accelerators, reconfigurable computing and HPC
Program
10:00 - 10:15 Workshop opening (workshop co-chairs)
10:15 - 11:00 Keynote by Carsten Binnig (TU Darmstadt): Databases on High-speed Networks: The Time for RDMA has come!
Abstract: High-speed networks that offer RDMA-like technologies have become the standard in data center networks for major cloud providers, providing efficient routing of massive application traffic at scale with low latency. With the recent pivot of the DBMS market from on-premise to cloud-based solutions, there is an opportune moment for system builders to adopt RDMA for constructing truly scalable cloud databases. In this talk, I will first introduce RDMA and its cloud-deployed derivatives, followed by an overview of the important insights gained over the past decade on effectively and correctly utilizing RDMA for designing scalable database systems. Finally, the talk will also cover recent opportunities and future directions such as programmability of the network on how to improve RDMA for databases in the cloud.
Bio: Professor Carsten Binnig is a Full Professor in the Computer Science department at TU Darmstadt and a Visiting Researcher at the Google Systems Research Group. He received his Ph.D. from the University of Heidelberg in 2008 and spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. His current research focus is on the design of scalable data systems on modern data center hardware as well as machine learning for scalable data systems. His work has received numerous awards, including a Google Faculty Award and multiple best paper and best demo awards at venues such as VLDB and SIGMOD.
11:00 – 11:30 Coffee break
11:30 - 13:00 Tech session 1: High Performance Data Analytics (chair: Teresa Cervero, BSC)
Alexander Krause (TU Dresden): Program your (custom) SIMD instruction set on FPGA in C++
Abstract: Field Programmable Gate Arrays (FPGAs) are more and more becoming a viable option for implementing data processing pipelines as their computing capacity as well as the access bandwidth between host and device memory continues to increase. Unfortunately, nowadays hardware description languages are still mainly used for programming FPGAs which implies major limitations. To tackle this issue, this talk shows that the general-purpose parallel processing architecture SIMD (Single Instruction Multiple Data) is a perfect match for FPGAs. With this specific architecture, we are able to consider an FPGA as SIMD processing unit and the necessary SIMD instruction set can now be implemented in C++. This offers a lot of advantages if both software (SIMDified query processing) and hardware can be written consistently in C++.
Bio: Alexander Krause is a PostDoc at TU Dresden’s Database Research Group, chaired by Wolfgang Lehner. His dissertation on "Graph Pattern Matching on Symmetric Multiprocessor Systems" focused on energy-efficient and adaptive graph processing. Alexander is an appointed member of the SIGMOD Availability and Reproducibility Committee (ARC). Previously, he served as the Proceedings Chair for BTW 2021 and 2023 in Dresden. His research mainly focuses on databases in the context of disaggregated systems by leveraging RDMA and CXL.Marcus Paradies (TU Ilmenau): Leveraging Computational Storage for Data-Intensive Applications - Myth or Reality?
Abstract: Computational Storage refers to storage architectures that provide Computational Storage Functions coupled to storage, offloading host processing, or reducing data movement. Initial research demonstrators have shown that Computational Storage can achieve better performance and thereby consume less energy for certain data-intensive tasks compared to traditional data-shipping architectures. Although technically promising, Computational Storage has so far "underwhelmed" the market. In this presentation I will share my view of the current landscape of Computational Storage ranging from the storage hardware perspective up to applications that can benefit from Computational Storage. Finally I will share my view on how Computational Storage can fit into the overall system landscape and also highlight possible research directions.
Bio: Marcus Paradies is a Postdoctoral Researcher at the Databases and Information Systems Group at TU Ilmenau. Before that, he was a senior researcher at the Institute of Data Science of the German Aerospace Center with a focus on efficient data management and storage technologies for scientific data. His current research interest is in leveraging novel storage technologies, such as NVMe and Computational Storage, for large-scale data management systems.Javier Picorel (Huawei Cloud R&D): Hardware-acceleration in All you Need! Exploiting Heterogeneity in the Cloud for Data and AI Systems
Bio: Dr. Javier Picorel is an Engineering Manager at Huawei Cloud R&D in Huawei's Munich Research Center. His group's mission at Huawei is to build the next-generation TCO-efficient cloud infrastructure. His research interests encompass the broad area of computer systems and computer architecture, with an emphasis on serverless computing, data and AI systems, and hardware-software co-design. He received a PhD in computer science from EPFL in 2017, and is also the recipient of several awards during his tenure at Huawei.
13:00 – 14:00 Lunch
14:00 - 14:45 Keynote by Georgi Gaydadjiev (TU Delft): Will Custom Computing become accepted HPC technology: Challenges and Opportunities
Abstract: Reconfigurable accelerators for HPC systems, e.g. FPGA based Custom Computers, have been shown to deliver competitive solutions for several critical applications in terms of throughput and energy efficiency, when compared to conventional technologies, e.g., GPUs. Despite its potential, this technology is far from wide acceptance in production HPC systems. This talk will discuss some of the related challenges to be addressed and will attempt to emphasize several opportunities ahead of this technology in the future.
Bio: Georgi Gaydadjiev is a computer engineer with more than 35 years of experience in the Industry and Academia. He contributed to the development of a wide range of computer systems; from small, battery-operated devices up to application-specific supercomputers. Currently he holds the Chair Professor in Computer Architecture at the Delft University of Technology and is a honorary visiting professor at the Department of Computing of Imperial College London since 2014. Previously, he held the Chair in Innovative Computer Architectures at the University of Groningen till June 2023, and was a Chair in Computer Systems Engineering at Chalmers University of Technology in Sweden until May 2015. Georgi’s work received several recognitions, including the Design & Engineering Showcase Award at the Consumer Electronics Show (CES 1999) and the best papers from the 24th International Conference on Supercomputing (ICS'10) and USENIX/SAGE Large Installation System Administration conference (LISA 2006). Georgi remains a member of the CogniGron program board and is currently advising several high-tech companies. His research interests include, among others, application and data centric computer systems design, advanced computer architecture and micro-architecture, reconfigurable and custom computing, hardware/software co-design, and Embedded Systems design.
14:45 - 15:30 Tech session 2: High Performance Computing (chair: Dirk Pleiter, KTH)
Burkhard Ringlein (IBM Zürich): Don't forget the compiler: Why FPGAs for HPC need to look beyond circuits and applications
Abstract: FPGAs promise to accelerate HPC workloads and ML/AI models while also being energy efficient. However, today's FPGA tool chains are cumbersome to use, limited to specific use cases and devices, and mostly fail to support workflows requiring multi-node application scenarios. Within the H2020 project EVEREST (https://everest-h2020.eu/), we developed a design environment that aims to simplify the mapping of complex end-user workflows, like DNN inference or big data processing, to energy-optimized heterogeneous hardware with a focus on FPGA-accelerated systems. In this presentation, I will analyze the blind spots of the state of the art and introduce the practical solutions developed within the EVEREST project and the cloudFPGA team in Zurich. I will also present potential solutions for distributed HPC/ML applications and a prototype implementation, called DOSA (https://github.com/cloudFPGA/DOSA), that provides a one-click solution to partitioning, implementation and deployment of DNNs on multiple FPGAs.
Bio: Burkhard Ringlein is a Postdoctoral Researcher in the Hybrid Cloud department of the IBM Research Zurich Laboratory. His research interests are accelerated and energy-efficient computing, domain specific architectures, custom compiler stacks for heterogeneous computing platforms, distributed reconfigurable architectures and infrastructure automation. In the past, he worked on the cloudFPGA project. He received his PhD in 2022 from the Faculty of Engineering of the Friedrich-Alexander University Erlangen-Nürnberg, Germany with the thesis topic "Mapping of a Machine Learning Algorithm Representation to Distributed Disaggregated FPGAs" Before joining IBM Research Zurich, he worked with IBM Security in Germany (Kassel) and US (Atlanta), the Fraunhofer Institute for Integrated Circuits (Erlangen) as well as Nokia (Nürnberg). The German Informatics Society appointed Dr. Burkhard Ringlein as Junior-Fellow in October 2023.Pedro Marcuello (Semidynamics Technology): Semidynamics Tensor Unit: Improving AI Performance with Custom Tensor Instructions.
Bio: Pedro Marcuello joined Semidynamics in 2018 and currently he is the IP Director of the company. Pedro participated in the architectural design and the RTL development of both Atrevido and Avispado RISC-V family cores, the Vector Unit and the Tensor Unit. Also, Pedro worked in Broadcom as IC Design Engineer implementing an ARM v7/v8 multicore for the set-top-box market segment, and previously, Pedro worked at Intel labs for 12 years in multiple research projects. Pedro holds both MSc and PhD in Computer Science from the Universitat Politècnica de Catalunya.
15:30 - 16:00 Coffee break
16:00 - 17:30 Tech session 3: Systems (chair: Min Li, Huawei)
Rich Graham (NVIDIA): NVIDIA’s BlueField DPU: a communication library accelerator
Abstract: NVIDIA’s BlueField DPU is a system-on-a-chip with both network and computational capabilities. In this presentation I will introduce the BlueField device and capabilities that make it suitable for application acceleration. Communication library gains will be discussed from the perspective of communication library acceleration and the enablement of computation-communication overlap. The impact of these on application performance will also be discussed.
Bio: Dr. Richard Graham is Senior Director, HPC Technology at NVIDIA's Networking Business unit. His primary focus is on HPC network software and hardware capabilities for current and future HPC/AI technologies. Prior to moving to Mellanox/NVIDIA, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration and was chairman of the MPI 3.0 standardization efforts.Sunita Jain (AMD): CXL for Future HPC & Datacenter
Abstract: This talk will cover a quick introduction to CXL, current memory challenges and how CXL can help solve them. It will also discuss CXL based memory expansion and its use in HPC/Datacenter.
Bio: Sunita Jain is a Principal Member of Technical Staff at AMD. She joined AMD through Xilinx acquisition. During her sixteen-year career in Xilinx/AMD, Sunita’s efforts are mainly focused on designing system architectures. Her expertise spans across applications over PCI Express, Cache Coherent Interconnect for Accelerators (CCIX) and most recently Compute Express Link (CXL).Ying-Chih Yang (SiPearl): Architecture considerations for high performance compute elements
Abstract: SiPearl is developing high-performance processors for HPC applications in adv. silicon process. While building the product roadmap for this target, there are architectural considerations identified and to be addressed for the key compute elements.
Bio: Expert in complex system-on-chip development, Ying-Chih Yang is the CTO of SiPearl, and the lead architect for the energy-efficient HPC-dedicated microprocessor designed by SiPearl. He has driven successful endeavors at MStar Semiconductor, a Taiwan-based startup specialised in set-top boxes for pay-television, STMicroelectronics’ Consumer division and Atos. With a dual French-Taiwanese background, Ying-Chih holds a Master in Electronic Engineering from National Chiao-Tung University in Taiwan.
Target audience
The targeted audience is anyone interested in the current efforts carried out world-wide on the FPGA/xPU accelerators and heterogeneous solutions in the context of HPC and/or data center. More particularly, this workshop is of interest to HPC, cloud, edge, and data center communities, with software and/or hardware background.
Organization
Teresa Cervero (BSC) - teresa.cervero(at)bsc.es
Holger Froening (U. Heidelberg) - holger.froening(at)ziti.uni-heidelberg.de
Dirk Pleiter (KTH, Sweden) - pleiter(at)kth.se
Min Li (Huawei Research Europe) - minli2(at)huawei.com