Keynote Talks

Large Scale LLM Training

Abstract

In this talk I will touch on a variety of topics to illustrate the challenges of developing the latest generation of large scale LLMs. Starting with some motivational data that shows some trends about scalable ML systems we will dive into their requirements in terms of parallel programming, its implications on networking, and fault tolerance. The presentation intends to describe how this area involves a large number of variables, not quite independent of each other, the present a significant number of trade-off decisions. It is a great time to be a computer architect indeed!

Bio

I am currently Senior Staff Engineer at Google, where I design, develop, and deliver high performance accelerator systems like the YouTube Video (trans)Coding Unit (VCU), or the ML infrastructure powering Gemini. Before that, I was Principal Research Scientist at NVIDIA in the Architecture group, associate professor at UPC, and Research Manager at the Barcelona Supercomputing Center. I have a BsC ('95), MsC ('97) and PhD ('02, awarded the UPC extraordinary award to the best PhD in computer science) in Computer Science from the Universitat Politecnica de Catalunya, Barcelona, Spain. I have been a summer student intern with Compaq's Western Research Laboratory in Palo Alto, California for two consecutive years ('99-'00), and with Intel's Microprocessor Research Laboratory in Santa Clara ('01). I was awarded the first edition of the Agustin de Betancourt Award to a Young Researcher by the Spanish Royal Academy of Engineering in 2010. I have co-authored more than 150 papers in international refereed conferences and journals, and supervised 10 PhD students. My research interests include energy efficient supercomputing, heterogeneous multicore architectures, hardware support for programming models, and simulation techniques.

Scaling AI Sustainably: An Uncharted Territory

Abstract

The past 50 years has seen a dramatic increase in the amount of compute per person, in particular, those enabled by AI. Despite the positive societal benefits, AI technologies come with significant environmental implications. I will talk about the scaling trend and the carbon footprint of AI computing by examining the model development cycle, spanning data, algorithms, and system hardware. At the same time, we will consider the life cycle of system hardware from the perspective of hardware architectures and manufacturing technologies. I will highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we need to make AI and computing, more broadly, efficient and flexible. We must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operation and end-of-life processing for the hardware. Based on the industry experience and lessons learned, my talk will conclude with important development and research directions to advance the field of computing in an environmentally-responsible and sustainable manner.

Bio

Carole-Jean Wu is a Director of Research Science at Meta. She is a founding member and a Vice President of MLCommons – a non-profit organization that aims to accelerate machine learning for the benefits of all. Dr. Wu also serves on the MLCommons Board as a Director, chaired the MLPerf Recommendation Benchmark Advisory Board, and co-chaired for MLPerf Inference. Prior to Meta/Facebook, She was a professor with tenure at ASU. She earned her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. Dr. Wu’s expertise sits at the intersection of computer architecture and machine learning. Her work spans across datacenter infrastructures and edge systems with a focus on performance, energy efficiency and sustainability. She is passionate about pathfinding and tackling system challenges to enable efficient, scalable, and environmentally-sustainable AI technologies. Her work has been recognized with several awards, including IEEE Micro Top Picks and ACM/IEEE Best Paper Awards.

Adaptive Microarchitecture Design: Harnessing Data-Driven Intelligence

Abstract

In traditional microarchitecture design, performance evaluation often relies on cycle-accurate simulators assessing the IPC (Instructions Per Cycle) for selected software traces. Diverse applications typically produce a S-curve, where specific microarchitecture features enhance performance for some traces but degrade it for others. Maximizing performance necessitates dynamically adjusting these features based on workload phase detection, aiming to transform the S-curve into L-curve. While existing methods often use supervised learning, they frequently yield suboptimal results. Deep reinforcement learning (RL), despite its promise, faces challenges such as sample efficiency, stability, and hardware costs. However, recent advancements in offline RL algorithms, which can utilize vast amounts of simulation and real data, offer an efficient means of adapting to varying software phases to enhance IPC. This talk explores a design flow encompassing iterative hardware telemetry counter selection, the application of effective offline RL algorithms, and a novel mechanism for stochastic inference exploitation. Additionally, it will discuss future research directions, particularly the concurrent management of multiple microarchitecture features to further boost system performance.

Bio

Dr. Gilles Pokam is a Senior Principal Engineer at Intel. Prior to joining Intel, Dr. Pokam was a postdoctoral researcher at the University of California, San Diego and a researcher at the IBM T.J. Watson Research Center in NY. His research focuses on processor microarchitecture and its interactions with system software. Dr. Pokam is currently spearheading the development of next-generation CPU microarchitectures using AI/ML to enhance the performance and energy efficiency of emerging data center workloads. Dr. Gilles Pokam holds a Ph.D. in Computer Science from INRIA (France). He holds more than 30 patents and has authored more than 50 papers at leading conferences on microarchitecture and system software. Dr. Pokam is two-time recipient of the IEEE MICRO Top Picks. His research was also selected in 2023 for inclusion in the ISCA@50 25-year Retrospective. Additionally, Gilles was the recipient of the ASPLOS 2024 Best Paper Award. He is also a member of the MICRO Hall of Fame.