Plenary Speakers
Plenary 1: Michael W. Mahoney (LBNL, UC Berkeley)
Session Chair: Cosmin Safta (Sandia National Laboratories)
Room: Dr. Vikram and Priya Lakireddy Grand Ballroom.
Title: Foundational Methods for Foundation Models for Scientific Machine Learning
Abstract: The remarkable successes of ChatGPT in natural language processing (NLP) and related developments in computer vision (CV) motivate the question of what foundation models would look like and what new advances they would enable, when built on the rich, diverse, multimodal data that are available from large-scale experimental and simulational data in scientific computing (SC), broadly defined. Such models could provide a robust and principled foundation for scientific machine learning (SciML), going well beyond simply using ML tools developed for internet and social media applications to help solve future scientific problems. I will describe recent work demonstrating the potential of the "pre-train and fine-tune" paradigm, widely-used in CV and NLP, for SciML problems, demonstrating a clear path towards building SciML foundation models; as well as recent work highlighting multiple "failure modes" that arise when trying to interface data-driven ML methodologies with domain-driven SC methodologies, demonstrating clear obstacles to traversing that path successfully. I will also describe initial work on developing novel methods to address several of these challenges, as well as their implementations at scale, a general solution to which will be needed to build robust and reliable SciML models consisting of millions or billions or trillions of parameters. These novel methods raise new and interesting challenges in the foundations of machine learning and the applied mathematics of data.
Dr. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as head of the Machine Learning and Analytics Group at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, scalable stochastic optimization, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he was on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute's 2016 PCMI Summer Session on The Mathematics of Data, he ran the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he was the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
Plenary 2: Robert Schreiber (Cerebras Systems, Inc.)
Session Chair: Esmond Ng (Lawrence Berkeley National Laboratory)
Room: Dr. Vikram and Priya Lakireddy Grand Ballroom.
Title: Wafer Scale Computing: Fine Grain Parallelism and Rethinking Parallel Computing
Abstract: Chips are made by photolithographic printing of circuits on thin silicon wafers that today are 12-inch diameter circles. A matrix of identical chips is printed on the surface, and a saw then cuts the wafer into individual chips. But in wafer-scale computing, there is no saw. The whole wafer remains intact, serving as a single “chip”, but with two orders of magnitude more transistors than a conventional chip. As of October 2024, Cerebras Systems is the only manufacturer of (and its CS-3 is the only instance of) commercially available wafer-scale computers.
The CS-3 incorporates all memory and processing on one wafer, a wafer that contains about 840,000 processing elements. With 48KB of local memory, a PE cannot hold very much data. On the other hand, access to that data is at the same rate as peak speed computation. Most interesting, the mesh interconnect has single-clock latency for sending a message (of 4 bytes) to a mesh neighboring PE, and the network can sustain a 4 byte message to and from each neighbor on every clock.
The wafer is therefore a working instance of processing co-located with memory. While it is distributed memory from the addressing perspective, the interconnect’s performance allows programmers to treat distributed data structures - graphs, matrices, data arrays - as if they were shared; they are shared objects housed in a distributed memory substrate.
Wafer-scale computing is therefore a new thing at the hardware level (no saw); at the architecture level (because communication is intrinsic to the architecture and the instruction set); at the algorithmic level (because memory and communication walls have been toppled, allowing strong scaling and effective fine-grain parallelism); and at the programming level, where application code tightly integrates all the wafer resources, explicitly controlling communication as well as computation.
So there are no memory walls and no high-overhead, high-latency, low-bandwidth interconnects on the wafer; the upshot is that very fine-grained parallel applications achieve excellent performance. This in turn allows parallel implementations in which each PE holds only a few words of the problem data, taking full advantage of the easy accessibility of data on near neighbor PEs. Thus, strong scaling is quite successful, which reduces runtimes for problems of the scale that fit the wafer by two orders of magnitude, allowing applications that are impossible with conventional systems. There are now demonstrations of a dramatic speedup for fluid flow, molecular dynamics, and radiation transport applications. Thus, wafer-scale computing has created the possibility of a dramatic shift in how we build, think about, and use computation for science.
Dr. Schreiber is a Distinguished Engineer at Cerebras Systems, Inc., where he works on architecture and programming of systems for accelerated training of deep neural networks. Schreiber’s research spans sequential and parallel algorithms for matrix computation, compiler optimization for parallel languages, and high performance computer design. With Moler and Gilbert, he developed the sparse matrix extension of Matlab. He created the NAS CG parallel benchmark. He was a designer of the High Performance Fortran language. He led the development at HP of the PICO system for synthesis of custom hardware accelerators. He has helped pioneer the exploitation of photonic signaling in processors and networks. He is an ACM Fellow and a SIAM Fellow, and was awarded in 2012 the Career Prize from the SIAM Activity Group in Supercomputing.
Plenary 3: Mariel Vázquez (University of California, Davis)
Session Chair: Noemi Petra (University of California, Merced)
Room: Dr. Vikram and Priya Lakireddy Grand Ballroom.
Title: The geometry and topology of nucleic acids
Abstract: The genetic code of viruses and of living organisms is contained in very long DNA or RNA molecules, which are tightly packaged in confined environments. The molecules need to coil upon themselves in order to fit. We study the changes in DNA topology mediated by essential processes such as DNA packing and transcription of DNA into RNA. These processes are highly regulated, and even small structural changes can lead to catastrophic effects. We use techniques from knot theory and topology, aided by discrete and computational methods, to model the geometry and topology of nucleic acids. In this lecture I discuss DNA modeling and the formation and entanglement of DNA:RNA hybrids that arise during transcription. The presentation is accessible to students and suitable for a diverse interdisciplinary audience.
Dr. Vazquez is a Professor of Mathematics and of Microbiology & Molecular Genetics at the University of California, Davis. Her research focuses on the application of topological, discrete and computational methods in molecular biology, with an emphasis on DNA packing, topological changes induced by DNA replication and transcription, and the molecular evolution of coronaviruses. Vazquez is Fellow of the American Mathematical Society, of the Association for Women in Mathematics and of the American Association for the Advancement of Science. She received the Blackwell-Tapia Prize, the U.S. Presidential Early Career Award for Scientists and Engineers (PECASE), and the NSF CAREER Award. She has given a wide variety of invited research lectures nationally and internationally, as well as public lectures and other discrimination efforts. Vazquez’ service to the mathematics profession is extensive.
Dr. Vazquez obtained a B.Sc. in Mathematics from the National University of Mexico (UNAM) and a Ph.D. from Florida State University, where she worked with De Witt Sumners and was supported by fellowships from DGAPA UNAM and the Program for Mathematics and Molecular Biology/Burroughs Wellcome Fund. After receiving her doctorate, she held appointments as a Postdoctoral Fellow/Visiting Assistant Professor at UC Berkeley working with Rainer Sachs. While at Berkeley, Vazquez received the Project NExT Fellowship. She joined the faculty at San Francisco State University in 2005, and the UC Davis faculty in July 2014. From 2019 to 2023 she served as faculty director of CAMPOS, a center dedicated to supporting a diverse group of early career STEM faculty.