About the Summit

The AI & Systems Co-design Faculty Summit brings together researchers and practitioners from across academia and industry to present and discuss the key challenges that face the AI & Systems Codesign space.


This year, we will have two keynote speakers for the event.  Mike Zeile, Senior Director at Meta, would present the opening keynote, focusing on the topic of  infra challenges and co-design opportunities.  The afternoon keynote will be presented by Robert Wisniewski, Senior VP at Samsung, focusing on memory and communication challenges.  The summit would be wrapped up by a closing talk by Maxim Naumov, Senior Manager at Meta. Together with the keynotes, the summit would also feature several other talks from academic researchers who have been engaged with Meta on various research topics over the past few years.


We look forward to seeing you at the summit and continuing the conversation at the happy hour afterwards!!

Agenda

Meta Keynote (9:30 AM - 10:30 AM PDT) [Session Chair: Shobhit Kanaujia]

Title: Infrastructure challenges - supporting divergent requirements at Meta scale.

Speaker: Mike Zeile, Senior Director, AI Systems and Accelerated Platforms, Meta

Bio:

Mike supports product management and co-design within the ASAP organization at Meta.  He joined Meta in July 2022 from Intel Corporation, where he held multiple GM and product/strategy leadership roles in Intel’s Connectivity, Networking, and High-Performance Fabrics Groups. Prior to Intel, Mike was a member of the leadership teams of multiple start-ups in communications and networking.  He has a dubious distinction of helping build six start-ups, all of which were acquired.

Mike received a bachelor's degree in Business from UCLA, with an emphasis in Computer Science, and is the proud father of two UCLA graduates.




Session 1 - Model Hardware/Software Co-Design (10:30 AM - 12:00 PM PDT) [Session Chair: Guna Lakshminarayanan]


Hardware/Software Codesign For Sparse Neural NetworksFredrik Berg Kjoelstad (Stanford)


SqueezeLLM: Dense and Sparse Quantization”, Amir Gholami (UC Berkeley)


Chakra and ASTRA-sim: An open-source ecosystem for advancing co-design for future AI systems”, Taekyung Heo/Tushar Krishna (GaTech)



Lunch - EPIC Cafe (12 PM - 1 PM PDT)


External Keynote (1:00PM - 2:00 PM PDT) [Session Chair: Pavan Balaji]

Title: Key HPC and AI Challenges: Memory and Communication

Speaker: Robert Wisniewski, Senior Vice President and Chief Architect of HPC, Head of Samsung’s SAIT Systems Architecture Lab


Abstract

The notion of a "Memory Wall" was identified almost three decades ago.  Since, memory has continued to get faster, HBM has been introduced, and computing paradigms have been explored; nevertheless, the memory wall is higher than it was three decades ago.  A significant number of classical HPC applications - modeling and simulation applications - are bottlenecked due to insufficient memory bandwidth.  More recently the communication wall has received attention since AI applications are often bottlenecked because of insufficient communication bandwidth.  In this talk I will discuss the research we are undertaking to design the hardware and software architecture for HPC and AI applications to tackle these challenges.  I will suggest a path forward based on leveraging tightly integrating memory and compute, called Memory Couple Compute (MCC), and describe the interesting design space that needs to be considered to make this architecture a reality.  The architectural space is broad, so a key aspect of our investigation involves codesign among application developers, system software, and hardware with key users.  A successful effort on this front will produce a MCC capability that has the potential to be the next discontinuity in HPC and AI.


Bio:

Dr. Robert W. Wisniewski is a Senior Vice President, Chief Architect of HPC, and the Head of Samsung's SAIT Systems Architecture Lab.  He is an ACM

Distinguished Scientist and IEEE Senior Member.  The System Architecture Lab is innovating technology to overcome the memory and communication

walls for HPC and AI applications.  He has published over 80 papers in the area of high performance computing, computer systems, and system

performance, has filed over 60 patents with 46 issued, has an h-index of 41 with over 7400 citations, and has given over 82 external invited

presentations.  Prior to joining Samsung, he was an Intel Fellow and CTO and Chief Architect for High Performance Computing at Intel.  He was the

technical lead and PI for Aurora, the supercomputer to be delivered to Argonne National Laboratory that will achieve greater than an exaflop of

computation.  He was also the lead architect for Intel's cohesive and comprehensive software stack that was used to seed OpenHPC, and served on the 

OpenHPC governance board as chairman.  Before Intel, he was the chief software architect for Blue Gene Research and manager of the Blue Gene and

Exascale Research Software Team at the IBM T.J. Watson Research Facility, where he was an IBM Master Inventor and led the software effort on Blue

Gene/Q, which received the National Medal of Technology and Innovation, was the most powerful computer in the world in June 2012, and occupied 4

of the top 10 positions on the Top 500 list.



Session 2 - Communication and Data (2:00 PM - 3:00 PM PDT) [Session Chair: Pavan Balaji]


Accelerating Deep Learning Recommendation Model Training through Adaptive Lossy Compression”, Dingwen Tao (Indiana U)


Scaling Distributed Deep Learning Training: A Dive into Collective Communication Challenges and Opportunities”, Xiaoyi Lu (UC Merced)



Break (3:00 PM - 3:15 PM PDT)



Session 3 - Workloads (3:15PM - 4:45 PM PDT) [Session Chair: Abhishek Dhanotia]


Efficient Data Tiering for Heterogeneous Memory Systems”, Bhattacharjee, Abhishek (Yale)


Enabling Efficient and Sustainable Hyperscale Web Services”, Akshitha Sriraman (CMU)


"Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization", Alex Aiken (Stanford)



Closing Remarks (4:45PM - 5:15 PM PDT)

Speaker: Maxim Naumov, Senior Engineering Manager, AI and Systems Co-design Team

Bio:

Maxim Naumov is a senior engineeering manager at Meta. His interests include deep learning recommendation models (DLRMs), content understanding (CV/NMT),  generative AI (LLMs/LDMs) and Performance @scale. In the past, he held different positions at Nvidia Research, Emerging Applications and Platform teams. He has also worked at Intel Corporation Microprocessor Technology and Computational Software Labs. Maxim was awarded Intel Fellowship and received his PhD in computer science (with specialization in computational science and engineering) in 2009 and BS in computer science and mathematics in 2003 from Purdue University – West Lafayette




Happy Hour - EPIC Cafe (5:15PM - 7:00 PM PDT)