Azalia Mirhoseini is an Assistant Professor in the Computer Science Department at Stanford University. Her research interest is in developing capable, reliable, and efficient AI systems for solving high-impact, real-world problems. She also spends time at Google DeepMind and prior to Stanford, she spent several years in industry AI labs, including Anthropic and Google Brain. Her work has been recognized through the MIT Technology Review’s 35 Under 35 Award, the Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including MIT Technology Review, IEEE Spectrum, The Verge, The Times, ZDNet, VentureBeat, and WIRED.
Title: Inference Scaling: A New Frontier for AI Capability
Abstract: Scaling laws, which demonstrate a predictable relationship between AI’s performance and the amount of training data, compute, and model size continue to drive progress in AI. In this talk, we present inference compute as a new frontier for scaling LLMs. Our recent work, Large Language Monkeys, shows that coverage - the fraction of problems solved by any attempt - persistently scales with the number of samples over four orders of magnitude. Interestingly, the relationship between coverage and the number of samples is log-linear and can be modeled with an exponentiated power law, suggesting the existence of inference-time scaling laws. In domains where answers can be automatically verified, like coding and formal proofs, we show that these increases in coverage directly translate into improved performance. In domains without verifiers, we find that identifying correct samples out of many generations remains challenging. Common methods to pick correct solutions from a collection of samples, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget. We then present Archon, a framework for automatically designing effective inference-time systems composed of one or more LLMs. Archon selects, combines, and stacks layers of inference-time operation such as repeated sampling, fusion, ranking, model based unit testing, and verification to construct optimized LLM systems for target benchmarks. It alleviates the need for automated verifiers by enabling strong pass@1 performance across diverse instruction following, reasoning, math, and coding tasks. Finally, we discuss some of our recent hardware acceleration techniques to improve the computational efficiency of serving LLMs.
Jason Lee is an associate professor in Electrical Engineering and Computer Science (secondary) at Princeton University. Prior to that, he was in the Data Science and Operations department at the University of Southern California and a postdoctoral researcher at UC Berkeley working with Michael I. Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in the theory of machine learning, optimization, and statistics. Lately, he has worked on the foundations of deep learning, representation learning, and reinforcement learning. He has received the Samsung AI Researcher of the Year Award, NSF Career Award, ONR Young Investigator Award in Mathematical Data Science, Sloan Research Fellowship, NeurIPS Best Student Paper Award and Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization.
Title: How Transformers Learn Causal Structure
Abstract: The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure. We prove that gradient descent on a simplified two-layer transformer learns to solve this task by encoding the latent causal graph in the first attention layer. The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens. As a consequence of the data processing inequality, the largest entries of this gradient correspond to edges in the latent causal graph. As a special case, when the sequences are generated from in-context Markov chains, we prove that transformers learn an induction head (Olsson et al., 2022). We confirm our theoretical findings by showing that transformers trained on our in-context learning task are able to recover a wide variety of causal structures.
Yuandong Tian is a Research Scientist Director in Meta AI Research (FAIR), leading the group of reasoning, planning and decision-making with Large Language Models (LLMs). He is the project lead for OpenGo project that beats professional players with a single GPU during inference, serves as the main mentor of StreamingLLM and GaLore that improve the training and inference of LLM, and is the first-author recipient of 2021 ICML Outstanding Paper Honorable Mentions DirectPred and 2013 ICCV Marr Prize Honorable Mentions HierarchicalDataDrivenDescent, and also received the 2022 CGO Distinguished Paper Award CompilerGym. Prior to that, he worked in Google Self-driving Car team in 2013-2014 and received a Ph.D in Robotics Institute, Carnegie Mellon University in 2013. He has been appointed as area chairs for NeurIPS, ICML, AAAI, CVPR and AIStats.
Title: Towards a unified framework of Neural and Symbolic Decision Making
Abstract: Large Language Models (LLMs) have made impressive strides in natural language processing, yet they still struggle with complex tasks that require advanced reasoning, planning, and optimization—tasks that demand a deeper level of thinking than simple chains of thought. Conversely, traditional symbolic solvers provide precise, guaranteed solutions to well-defined problems, but they lack the flexibility to handle more general challenges described in natural language. This talk explores unified frameworks that integrate neural and symbolic components, leveraging the strengths of both approaches. We will discuss hybrid and end-to-end systems that combine symbolic and neural techniques, as well as pure neural models that benefit from symbolic outputs. We conclude with a recent discovery showing that gradient descent in neural networks can yield solutions completely explainable through advanced algebraic (and thus symbolic) objects, such as groups and semi-rings, hinting at the potential for a deeper unification of these paradigms from first principles.
Quanquan Gu is an Associate Professor of Computer Science at UCLA. His research is in artificial intelligence and machine learning, with a focus on nonconvex optimization, deep learning, reinforcement learning, large language models, and deep generative models. Recently, he has been utilizing AI to enhance scientific discovery in domains such as biology, medicine, chemistry, and public health. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Simons Berkeley Research Fellowship among other industrial research awards.
Title: Self-Play Preference Optimization for Language Model Alignment
Abstract: Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this talk, I will introduce a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-Play Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.