Speakers

Dr. Stephen Neuendorffer, Xilinx (AMD)

Biography

Stephen Neuendorffer is a Fellow at AMD in the Adaptive and Embedded Computing Group working on various aspects of system design and compilation for compute acceleration. Previously, he was product architect of Xilinx Vivado HLS and co-authored a widely used textbook on HLS design for FPGAs. He received B.S. degrees in Electrical Engineering and Computer Science from the University of Maryland, College Park in 1998. He graduated with University Honors, Departmental Honors in Electrical Engineering, and was named the Outstanding Graduate in the Department of Computer Science. He received the Ph.D. degree from the University of California, Berkeley, in 2003, after being one of the key architects of Ptolemy II.

Programming Heterogeneous Devices with MLIR

Abstract

With the slowing of longtime trends in CMOS technology scaling, many people are looking towards a ‘new golden age of computer architecture’ to enable new differentiated devices. Many of these new computer architectures require rethinking the way that programmers think about their applications, requiring new tools and compilers to unlock the power of these devices. This talk will focus on our work using multiple levels of abstraction, expressed in the MLIR compiler framework, to target new programmable devices at AMD/Xilinx. These heterogeneous devices contain both general purpose ARM CPUs along with specialized AIEngine Vector/VLIW processors and Programmable Logic compute resources.

Prof. Hidehiko Masuhara, Tokyo Institute of Technology

Biography

Hidehiko Masuhara is a Professor of Mathematical and Computing Science, Tokyo Institute of Technology since April 2013. He received his B.S., M.S., and Ph.D. in Computer Science from Department of Information Science, University of Tokyo in 1992, 1994, and 1999, respectively, and served as an assistant professor, lecturer, and associate professor at Graphics and Computer Science, Graduate School of Arts and Sciences, University of Tokyo from 1995 until 2013.

High-Level Programming Abstractions for GPGPU

Abstract

General-purpose computing on graphics processing units (GPGPU) is an approach to achieve highly parallel and energy efficient computing. While there are a lot of low-level programming techniques to exploit GPU's peculiar performance characteristics, it is challenging to write parallel programs for GPGPU. This talk introduces the speaker's attempts to bring high-level programming abstractions into GPGPU. In particular, it presents a support for objects and their dynamic allocation, and discusses further abstractions for task parallelism and graph-processing.

Dr. Pei-Hung Lin, Lawrence Livermore National Laboratory

Biography

Pei-Hung Lin is a computer scientist in the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory. He is a core member in the ROSE compiler project and was a member of the Advanced Architecture and Portability Specialists (AAPS) team at LLNL. His research focus spans compiler optimization, parallel programming model, and domain-specific optimization. Dr. Lin received his Ph.D. degree in Computer Science from University of Minnesota. He has participated in various projects working with leading HPC systems including LANL Roadrunner system, NCSA Blue Waters system, and LLNL Sierra system.

Towards Performance Portability and Correctness for HPC Systems

Abstract

Over the past decade, heterogeneous computing took over the leading role in HPC system design and has dominated the Top500 list. The complexity in heterogeneous systems introduces extreme challenges in application development and performance portability. Existing scientific applications require code modernization and transformation to make application porting possible. These transformations involve adopting new languages, programming models or performance portability layers to exploit new hardware features. Besides the challenges in application porting, the efforts required in development and debugging to ensure the correctness of the computation result increase significantly due to the massive parallelism in the heterogeneous HPC system.

In this talk, I will present how research and development activities in Lawrence Livermore National Laboratory (LLNL) assisted scientific application developments to overcome the challenges in performance portability and data race detection. Using cases from LLNL applications, I will present various strategies used at LLNL to pursue performance portability for the heterogeneous systems. I will also present how LLNL research tackles data races in multi-threaded parallel applications and how we assist the community in data race detection tool development.

Mr. Simon Wang, Andes Technology

Biography

Simon is the senior technical marketing manager of Andes. He is responsible for software product planning, marketing strategy and ecosystem engagement. Simon devotes himself to improve people’s lives with software engineering, no matter as the developer, designer, architect or planner. He used to work in MediaTek and Moxa, and got his M.S. and BS in NTCU EE.

Accelerating Data Computation with RISC-V Processors

Abstract

The open RISC-V architecture is leading the new wave of computing innovations, in particular for the emerging applications of AIoT, Automotive, 5G, Networking and Storage. Andes is the major force to take RISC-V mainstream and we would like to take this opportunity to share our vision and experience. In this talk, we will update the latest RISC-V architecture support and how they can accelerate data-intensive computation for a wide range of applications. The RISC-V Packed SIMD/DSP (P) extension targets for speeding up fixed-point computations for audio, voice, small image and slow video, usually in cost-sensitive and low-power devices. RISC-V Vector (V) extension aims for higher data-rate computations in both fixed-point and floating-point computations such as AI, communication, video and vision. For those interested in pursuing the ultimate performance efficiency with domain-specific architecture, RISC-V’s custom extensibility creates a whole new opportunity for innovations. A scalable architecture will be provided to highlight the important role a RISC-V processor can play.

To fulfill such a potential and reduce the development cycle, it is necessary to have easy-to-use software tools and an intuitive programming model, including advanced toolchains, processor pipeline visualizer/analyzer, and the optimized computation libraries for neural network, DSP and math functions. Furthermore, we will present a couple use cases in AI, NN, DSP applications, and results from several benchmarks.