Program‎ > ‎


Dr. Jan Langer, Xilinx
Xilinx AI Engine: Architecture and Programming

The recently announced Xilinx ACAP Versal devices contain an array of new processing elements called AI Engines (AIE). They target the acceleration of compute-intensive application domains such as Machine Learning (ML), 5G communications, automotive vision processing and others.

This talk will give an overview of the ACAP Versal overall architecture, application mapping and software programming tools.The innovative AIE array architecture with shared memory between neighbouring processor cores and a high-bandwidth streaming interconnect network, which can also connect additional accelerators in the programmable logic (PL), provides deterministic data movement with low latency.

Xilinx’ new Unified SW Development Environment allows C/C++ programming of both, the computation kernels for individual AIE as well as dataflow-centric multi-core systems. Together with the programming of PL accelerators using high-level synthesis, this enables users with little or no hardware design experience to efficiently program Xilinx devices. In addition, domain-specific architecture overlays for ML further raise the programming abstraction level, such that users of typical ML frameworks leverage the device’s compute power at the push of a button.


Jan Langer is a Sr. DSP Architect at Xilinx for next-generation heterogeneous multi-core architectures with a focus on the AI Engine processor core. He joined Xilinx as part of the Research Labs in Dublin in 2012 where he started working on high-level synthesis for wireless algorithms but subsequently jumped into the AI Engine project to which he contributed from a conceptual stage to a full product. Before joining Xilinx, Jan Langer received his PhD in 2011 from Chemnitz University of Technology, where he worked in the field of high-level synthesis and currently teaches a course on formal verification methods.

Prof. Hermann Haertig, TU Dresden
M3 : Specialized Compute Units as First-Class Citizens

Specialized compute units have become common place in modern computer architectures, ranging from fixed-function accelerators via GPUs and DSPs to fully flexible FPGA components. In the general case, these components cannot be trusted. And they lack the architectural properties to run a protected operating system, namely user/kernel mode and virtual memory. Hence, mostly they are treated as I/O-devices. In recent years, several research activities have been started to change this, for example by providing FPGAs or GPUs direct access to virtual memory or a file system. However these efforts are special cases.

M3 is a hardware/software co-design proposed to overcome that situation. The desirable properties for a general approach are: minimally invasive for the design of the inner cores of specialized compute units, functionally simple hardware, direct access to OS services, direct  communication with other specialized units, no need for trust. 

To achieve this, M3 splits functionality as found in L4-like microkernels into two parts, one - the "DTU", implemented in hardware, attached to the design of a specialized compute unit - enforces isolation and controlled communication, the other - implemented in software, running on a privileged compute unit - configures the access and communication rights in the DTUs.

After having explained the approach in general and shown basic evaluations in the talk, I will describe possible extensions of the initial approach: add caches and virtual memory to specialized compute units “from the outside of the units", combination with more conventional architectures based on user/kernel-mode operating systems, scalability towards much larger scenarios.

Hermann Härtig is a full professor at Technische Universität Dresden since 1994 and leads its operating systems research group. Under his leadership, the group contributed significantly to L4 technology. It produced "L4/Fiasco",  the first implementation of the L4 microkernel in a high-level programming language, invented "L4Linux", an L4-based virtualisation technology, and the microhypervisor NOVA, and built several frameworks for component-based operating systems. The group pioneered the technology's application in real-time, supercomputing and security-sensitive environments, which also lead to deployments in real-life products.  Before joining TU Dresden, Hermann Härtig lead the "BirliX" operating systems project at the former German National Research Center for Information Technology. He regularly spent extended sabbatical visits at major industry and University research labs. He is cofounder of Eurosys and served as its first acting vice-chair.

Prof. Timothy Roscoe, ETH Zurich
Building Enzian: a research computer

Academic research in rack-scale and datacenter computing today is hamstrung by lack of hardware. Cloud providers and hardware vendors build custom accelerators, interconnects, and networks for commercially important workloads, but university researchers are stuck with commodity, off-the-shelf parts.

Enzian is a series research computer being developed at ETH Zurich (in collaboration with Cavium and Xilinx) to tackle this problem. By providing a powerful and flexible platform for computer systems research, Enzian aims to enable more relevant and far-reaching work on future compute platforms.

An Enzian board consists of a server-class ARMv8 SoC tightly coupled and cache-coherent with a large FPGA (eliminating PCIe), with about 0.5 TB DDR4 and nearly 500 Gb/s of network I/O either to the CPU (over Ethernet) or directly to the FPGA (potentially over custom protocols). Enzian runs both Barrelfish and Linux operating systems. Many Enzian boards can be connected in a rack-scale machine (either with or without a discrete switch) and the design is intended to allow many different research use-cases: zero-overhead run-time verification of software invariants, novel interconnect protocols for remote memory access, hardware enforcement of access control in a large machine, high-performance streaming analytics using a combination of software and configurable hardware, and much more.

Timothy Roscoe is a Full Professor in the Systems Group of the Computer Science Department at ETH Zurich, where he works on operating systems, networks, and distributed systems, including the Enzian research computer and the Strymon high-performance stream processor for datacenter monitoring.  He received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS.  After three years working on web-based collaboration systems at a startup in North Carolina, Mothy joined Sprint's Advanced Technology Lab in Burlingame, California in 1998, working on cloud computing and network monitoring.  He joined Intel Research at Berkeley in April
2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services.  In September 2006 he spent four months as a visiting researcher in the Embedded and Real-Time Operating Systems group at National ICT Australia in Sydney, before joining ETH Zurich in January 2007.  His current research interests include high-performance distributed dataflow, system software for modern hardware, and hardware for system software research.  He was named Fellow of the ACM in 2013 for contributions to operating systems and networking research.