Berkeley Architecture Research Seminar

Bio & Abstract

Mark D. Hill

Title: Accelerator-level Parallelism

Abstract:

Computer system performance has improved due to creatively using more transistors (Moore’s Law) in parallel via bit-, instruction-, thread-, and data-level parallelism. With the slowing of technology scaling, a way to further improve computer system performance under energy constraints is to employ hardware accelerators. Each accelerator is a hardware component that executes a targeted computation class faster and usually with (much) less energy. Already today, many chips in mobile, edge and cloud computing concurrently employ multiple accelerators in what we call accelerator-level parallelism (ALP).

This talk develops our hypothesis that ALP will spread to computer systems more broadly. ALP is a promising way to dramatically improve power-performance to enable broad, future use of deep AI, virtual reality, self-driving cars, etc. To this end, we review past parallelism levels and the ALP already present in mobile systems on a chip (SoCs). We then aid understanding of ALP with the Gables model and charge computer science researchers to develop better ALP “best practices” for: targeting accelerators, managing accelerator concurrency, choreographing inter-accelerator communication, and productively programming accelerators. This joint work with Vijay Janapa Reddi of Harvard. See CACM 12/2021: https://cacm.acm.org/magazines/2021/12/256949-accelerator-level-parallelism

Biography:

Mark D. Hill is Partner Hardware Architect with Microsoft Azure (2020-present) where he leads software-hardware pathfinding. He is also the Gene M. Amdahl and John P. Morgridge Professor Emeritus of Computer Sciences at the University of Wisconsin-Madison (http://www.cs.wisc.edu/~markhill), following his 1988-2020 service in Computer Sciences and Electrical and Computer Engineering. His research interests include parallel-computer system design, memory system design, and computer simulation. Hill's work is highly collaborative with over 160 co-authors. He received the 2019 Eckert-Mauchly Award and is a fellow of IEEE and the ACM. He served on the Computing Community Consortium (CCC) 2013-21 including as CCC Chair 2018-20, Computing Research Association (CRA) Board of Directors 2018-20, and Wisconsin Computer Sciences Department Chair 2014-2017. Hill has a PhD in computer science from the University of California, Berkeley.

Sarita Adve

Title: Systems 2030: The Extended Reality Case

Abstract:

The end of Dennard scaling and Moore's law is leading to domain-specific heterogeneous systems. There is an accompanying explosion of applications deployed on heterogeneous edge devices that interface directly with the end-user as well as to edge and cloud servers. Realizing the full potential of these trends requires changing how we conduct systems research.

To drive the technologies to enable domain-specific systems for the next decade, my group has been working in the domain of extended reality (XR), including virtual, augmented, and mixed reality. XR has the potential to transform our lives, but there is an orders of magnitude performance-power-quality gap between what is achievable today and our ideal XR systems. To enable research in this area, we have built ILLIXR –- Illinois Extended Reality tested –- the first open source XR system and testbed for XR systems research.

Our results with ILLIXR show that systems of 2030 require, and ILLIXR enables, application-driven, end-to-end quality-of-experience driven, and hardware-software-application co-designed systems research. In this talk, I will describe ILLIXR, results from ILLIXR, and the many research projects that ILLIXR is enabling. I will also discuss the ILLIXR consortium, an industry backed consortium to democratize XR systems research, development, and benchmarking by creating a reference XR testbed, benchmarking methodology, and a multidisciplinary XR systems research community.

Biography:

Sarita Adve is the Richard T. Cheng Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests span the system stack, ranging from hardware to applications. Her early work on data-race-free memory consistency models and later work on the memory models for the Java and C++ programming languages form the foundation for memory models used in most hardware and software systems today. Recently, her group released ILLIXR (Illinois Extended Reality testbed), the first open-source extended reality system, and launched the ILLIXR consortium to democratize XR research. She is also known for her work on heterogeneous systems and software-driven approaches for hardware resiliency. She is a member of the American Academy of Arts and Sciences, a fellow of the ACM and IEEE, and a recipient of the ACM/IEEE-CS Ken Kennedy award, the Anita Borg Institute Women of Vision award in innovation, the ACM SIGARCH Maurice Wilkes award, and the University of Illinois campus award for excellence in graduate student mentoring. As ACM SIGARCH chair, she co-founded the CARES movement, winner of the CRA distinguished service award, to address discrimination and harassment in Computer Science research events. She received her PhD from the University of Wisconsin-Madison and her B.Tech. from the Indian Institute of Technology, Bombay.

Carole-Jean Wu

Title: Scaling AI Sustainably: Environmental Implications, Challenges, and Opportunities

Abstract:

The past decade has witnessed orders-of-magnitude increase in the amount of compute for AI. Modern natural language processing models are fueled with over trillion parameters while the memory needs of deep learning recommendation and ranking models have grown from hundreds of gigabytes to the terabyte scale. We will explore the environmental implications of the super-linear growth trend for AI from a holistic perspective, spanning data, algorithms, and system hardware. I will talk about the carbon footprint of AI computing by examining the model development cycle across industry-scale use cases and, at the same time, considering the life cycle of system hardware. The talk will capture the operational and manufacturing carbon footprint of AI computing. Based on the industry experience and lessons learned, I will share key challenges across the many dimensions of AI and what and how at-scale optimization can help reduce the overall carbon footprint of AI and computing. This talk will conclude with important development and research directions to advance the field of AI in an environmentally-responsible and sustainable manner.

Biography:

Carole-Jean Wu is currently a Research Scientist at Meta AI. Her research sits at the intersection of computer architecture and machine learning with an emphasis on designing energy- and memory-efficient systems, optimizing systems for machine learning execution at-scale, and designing learning-based approaches for system design and optimization. She is passionate about pathfinding and tackling system challenges to enable efficient, responsible AI execution. Carole-Jean chairs the MLPerf Recommendation Benchmark Advisory Board, co-chaired MLPerf Inference, and serves on the MLCommons Board as a Director. Carole-Jean is a tenured Associate Professor at ASU. She receives her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. She is the recipient of the NSF CAREER Award, Distinction of Honorable Mention of the CRA Anita Borg Early Career Award, the IEEE Young Engineer of the Year Award, the Science Foundation Arizona Bisgrove Early Career Scholarship, and the Intel PhD Fellowship. Her research has been recognized with several awards, including IEEE Micro Top Picks and IEEE/ACM Best Paper Awards.

Akshitha Siraman

Title: Enabling Hyperscale Web Services

Abstract:

Current hardware and software systems were conceived at a time when we had scarce compute and memory resources, limited quantity of data and users, and easy hardware performance scaling due to Moore's Law. These assumptions are not true today. Today, emerging web services require data centers that scale to hundreds of thousands of servers, i.e., hyperscale, to efficiently process requests from billions of users. In this new era of hyperscale computing, we can no longer afford to build each layer of the systems stack separately. Instead, we must rethink the synergy between the software and hardware worlds from the ground up.

In this talk, I will focus on re-thinking (1) software threading and concurrency paradigms and (2) data center hardware architectures. First, I will detail μTune, my software threading framework that is aware of the overheads induced by the underlying hardware's constraints. Then, I will discuss SoftSKU and Accelerometer—my proposals to answer the question of: How should we build data center hardware for emerging software paradigms in the post-Moore era? Finally, I will conclude by describing my ongoing and future research towards re-designing the systems stack to enable the hyperscale web services of tomorrow.

Biography:

Akshitha Sriraman is an Assistant Professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. Her research interests are in the area of bridging computer architecture and systems software, with a focus on making hyperscale data centers more efficient (via solutions that span the systems stack). The central theme of her work is to design software that is aware of new hardware constraints/possibilities and architect hardware that efficiently supports new hyperscale software requirements.

Sriraman's research has been recognized with an IEEE Micro Top Picks distinction, the 2021 David J. Kuck Dissertation Prize, and the ProQuest Distinguished Dissertation award. She was awarded a Facebook Fellowship, a Rackham Merit Ph.D. Fellowship, and a CIS Full-Tuition Scholarship. She was also named a 2019 Rising Star in EECS. Sriraman completed her Ph.D. in Computer Science and Engineering at the University of Michigan.