Keynote Talks

ML4ML: The Intriguing Interplay between Moore’s Law & Machine Learning

Abstract

We are at an exciting time in the industry. The exploding demand for computing for new AI/ML workloads, at the same time Moore’s Law is slowing down, and the resulting combination of opportunity and scarcity, have together triggered a wave of new innovations. In this talk, we will discuss some recent new ideas resulting from the synergistic interplay between machine learning and Moore’s law. We will present how hardware/software co-design has enabled innovative custom silicon and software-defined hardware, to support a new “Moore’s law for ML”, and the exciting opportunities ahead. Reciprocally, ML techniques have also been transformative in improving the design and efficiency of existing infrastructure, extending generational performance improvements of new hardware. We will discuss some key innovations in this area, as well as the significant opportunities ahead for more such applications of “ML for Moore’s Law.”

Bio

Parthasarathy (Partha) Ranganathan is currently a VP, technical Fellow at Google where he is the area technical lead for hardware and datacenters, designing systems at scale. Prior to this, he was a HP Fellow and Chief Technologist at Hewlett Packard Labs where he led their research on systems and data centers. Partha has worked on several interdisciplinary systems projects with broad impact on both academia and industry, including widely-used innovations in energy-aware user interfaces, heterogeneous multi-cores, power-efficient servers, accelerators, and disaggregated and data-centric data centers. He has published extensively (including being the co-author on the popular "Datacenter as a Computer" textbook), is a co-inventor on more than 100 patents, and has been recognized with numerous awards. He has been named a top-15 enterprise technology rock star by Business Insider, one of the top 35 young innovators in the world by MIT Tech Review, and is a recipient of the ACM SIGARCH Maurice Wilkes award, Rice University's Outstanding Young Engineering Alumni award, and the IIT Madras distinguished alumni award. He is also a Fellow of the IEEE and ACM, and is currently on the board of directors for OpenCompute.

Faster Neural Network Training, Algorithmically

Abstract

Training modern neural networks is time-consuming, expensive, and energy-intensive. As neural network training costs double every few months, it is difficult for researchers and businesses without immense budgets to keep up, especially as hardware improvements stagnate. In this talk, I will describe my favored approach for managing this challenge: changing the workload itself - the training algorithm. Unlike most workloads in computer science, machine learning is approximate, and we need not worry about changing the underlying algorithm so long as we properly account for the consequences. I will discuss how we have put this approach into practice at MosaicML, including the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the time since we began this work, we have reduced the training times of standard models like ResNet-50, Stable Diffusion, and GPT-3 by 5x-10x, and we're just scratching the surface. I will close with a number of open research questions we have encountered that merit the attention of the research community. This is the collective work of a dozen empirical deep learning researchers at MosaicML, and I'm simply the messenger.

Bio

Jonathan Frankle is Chief Scientist at MosaicML, where he leads the company's research team toward the goal of developing more efficient algorithms for training neural networks. In his PhD at MIT, he empirically studied deep learning with Prof. Michael Carbin, specifically the properties of sparse networks that allow them to train effectively (his "Lottery Ticket Hypothesis" - ICLR 2019 Best Paper). In addition to his technical work, he is actively involved in policymaking around challenges related to machine learning. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google Brain, Facebook AI Research, and Microsoft as an intern and Georgetown Law as an Adjunct Professor of Law.

Slides

Frankle-ASSYST-MLArchSys-2023.pdf

LLM Training at Wafer-Scale

Abstract

The demand for training large language models (LLMs) is increasing rapidly, leading to a need for more efficient computing systems. The Cerebras Wafer Scale Engine (WSE) is a system designed specifically for accelerating deep neural network training, with a silicon area 56 times larger than the largest GPU. This presentation explores the challenges of integration at this scale, including the unique advantages it offers. The hardware architecture that enables wafer-scale integration is discussed, as well as the software challenges of compiling for such a device. Additionally, the benefits and challenges of training LLMs across a cluster of WSEs are explored. The WSE's approach to unstructured sparsity is also described, including successes in applying this sparsity to LLMs and the remaining challenges in sparsity exploitation.

Bio

Valavan Manohararajah is the chief product architect at Cerebras Systems, where he led the development of the compiler flow for Wafer Scale Engines (WSEs) and is currently architecting the hardware and software for next-generation training appliances that utilize a cluster of WSEs. With over 50 patents to his name, Valavan previously worked at Intel Corporation, where he helped pioneer several FPGA industry firsts in CAD, IP, and architecture for FPGAs. He received his B.A.Sc, M.A.Sc., and PhD from the Electrical and Computer Engineering department at the University of Toronto.

Slides

ASSYST-MLArchSys-2023-Valavan.pdf

Software, Hardware, and Model Codesign for High-performance Transfomer-based Large Models

Abstract

The new wave of generative AI applications are powered by large transformer models. Due to the sheer cost of training and serving large models, performance is critical to the success of unblocking production and new innovations. In this talk, we will discuss the performance characteristics of different model training and serving scenarios, deepdive into some software and modeling optimizations that interplays with the latest hardware accelerators, and go over some lessons learned. To achieve high performance, we need to co-design models, software, and hardware to empower the fast innovations of algorithms and applications.

Bio

Zongwei Zhou is a Google software engineer focusing on machine learning performance, compiler optimizations, His work has dramatically increased the efficiency of Google ML fleet and empowered the latest large model training and inference advancements and generative AI products. He led Google TPU MLPerf submissions to demonstrate state-of-the-art performance on representative ML benchmarks. Zongwei also actively participates in Google TPU co-design to evolve the hardware accelerator with software and ML models. Before Google, he has been working on high performance systems for years, ranging from large-scale data analytics platforms, high-speed networking for cloud-native applications, to NVM-optimized in-memory data stores.

Slides

Zongwei-ASSYST-MLArchSys-2023.pdf