UCSD AI Seminar
Mon 12:00 - 12:50 pm PST
Title: Semi-discrete models in continuous time and space
Abstract: Deep learning is a simple and versatile modeling paradigm that makes use of differentiable components, but there are still some types of data such models struggle to handle natively; such as data that follow hybrid discrete- and continuous- dynamical systems, or continuous distributions with discrete structures and symmetries. In this talk, I'll discuss a couple of semi-discrete modeling approaches that incorporate discrete structures into models of continuous time or space, and show how training can be done through just gradient-based optimization.
The first work is an extension of continuous-time modeling with neural ordinary differential equations (ODEs) to modeling discrete and instantaneous changes to the system. A gradient-based approach allows this to be trained without prior knowledge of when these changes should occur or how many such changes should exist. We apply this to learning semi-discrete systems such as switching dynamical systems, physical systems with collisions, and temporal point processes. The second work focuses on learning probabilistic mappings between discrete and continuous random variables. We construct disjoint subspaces through a differentiable tessellation, and a normalizing flow is then constructed that has support only within each subspace, allowing us to cheaply parameterize distributions on constrained spaces. This has applications in dequantization, a method that allows the use of continuous density models for discrete data, and disjoint mixture modeling, an approach to mixture modeling where the compute cost does not scale with the number of mixture components.
 Neural ODEs https://arxiv.org/abs/1806.07366
 Neural Event Functions https://arxiv.org/abs/2011.03902
 Semi-Discrete Normalizing Flows through Differentiable Tessellation https://arxiv.org/abs/2203.06832
Bio: Ricky T. Q. Chen is a Research Scientist at Meta AI and was a PhD student at the University of Toronto advised by David Duvenaud. His research focuses on probabilistic deep learning, specifically on integrating structured transformations into probabilistic modeling, with the goal of improved interpretability, tractable optimization, or extending into novel areas of application. In terms of fundamental research, he's usually working on some combination of numerical simulations, automatic differentiation, and stochastic estimation. He enjoys applying these tools to a variety of applications, such as normalizing flows and spatiotemporal modeling.
Title: Grounding Neural Architecture Search on Deep Learning Theories?
Abstract: This talk will present a training-free and theory-grounded framework for Neural Architecture Search (NAS), with high performance, very low cost, and potential of interpretation. NAS has been widely exploited to automate the discovery of top-performer neural networks but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works attempe to explore indicators that can predict a network's performance with less to no training. Drawing the wisdoms from the deep learning theory community, we present a unified framework to understand and accelerate NAS, by disentangling essential theory-inspired characteristics of searched networks – Trainability, Expressivity, and Generalization (we call “TEG”), all assessed in a training-free manner, accompanied with rigorous correlation analysis with the networks’ empirical performance. The framework can be applied to both convolutional networks  and transformers ; could be easily scaled up to large datasets and be instantiated with various NAS search methods. We also visualize the “NAS trajectories” on the landscapes of those characteristics, which lead to an interpretable analysis of various NAS algorithms’ behaviors on different benchmarks . Our latest work yields finer-granularity observations, theoretically characterizing the impact of wiring patterns on the convergence of DNNs under gradient descent training.
Bio: Professor Zhangyang “Atlas” Wang is currently the Jack Kilby/Texas Instruments Endowed Assistant Professor in the Department of Electrical and Computer Engineering at The University of Texas at Austin, leading the VITA group (https://vita-group.github.io/). He also holds a visiting researcher position at Amazon. He received his Ph.D. degree in ECE from UIUC in 2016, advised by Professor Thomas S. Huang; and his B.E. degree in EEIS from USTC in 2012. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning. Most recently, he studies automated machine learning (AutoML), learning to optimize (L2O), robust learning, efficient learning, and graph neural networks. His research is gratefully supported by NSF, DARPA, ARL, ARO, IARPA, DOE, as well as dozens of industry and university grants. His students and himself have received many research awards and scholarships, as well as media coverage.
Title: Recent Advances in Probabilistic Forecasting for Big Time Series
Abstract: Time series forecasting is a key ingredient in the automation and optimization of business processes: in retail, deciding which products to order and where to store them depends on the forecasts of future demand in different regions; in cloud computing, the estimated future usage of services and infrastructure components guides capacity planning. Recent years have witnessed a paradigm shift in forecasting techniques and applications, from computer-assisted model-and assumption-based to data-driven and fully-automated approaches. This shift can be attributed to the availability of large, rich, and diverse time series data sources and result in a set of challenges that need to be addressed. In this talk, we shall discuss about modern approaches for probabilistic time series forecasting, and in particular, the models that efficiently combine the expressive power of neural networks and the data efficiency of classical dynamic models such as state-space models and Gaussian processes. Furthermore, we will touch upon the practical aspects of forecasting system.
Bio: Yuyang (Bernie) Wang is a Principal Machine Learning Scientist in AWS AI Labs, working mainly on large-scale probabilistic machine learning with its application in time series forecasting, anomaly detection, etc. He received his PhD in Computer Science from Tufts University, MA, US and he holds an MS from the Department of Computer Science at Tsinghua University, Beijing, China. Other than time series analysis, Bernie’s research interests span statistical machine learning, numerical linear algebra, and random matrix theory.
Title: Learning for Reliable Control in Dynamical Systems
Abstract: This talk describes ongoing research at Caltech on integrating learning into the design of reliable controllers for dynamical systems. To achieve certifiable control-theoretic guarantees while using powerful function classes such as deep neural networks, we must carefully integrate conventional control & planning principles with learning into unified frameworks. A special emphasis will be placed on methods that both admit relevant behavioral guarantees and are practical to deploy. These methods are demonstrated in a variety of applications, including smooth broadcasting of sports games, agile aerial flight while dealing with perturbations and boundary conditions, and fast planning in resource-limited safety-critical settings such as Mars rover navigation.
Bio: Yisong Yue is a Professor of Computing and Mathematical Sciences at the California Institute of Technology, as well as a Principal Scientist at Argo AI. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign.
Yisong's research interests are centered around machine learning, and in particular getting theory to work in practice. To that end, his research agenda spans both fundamental and applied pursuits. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, experiment design for science, protein engineering, seismology, program synthesis, learning-accelerated optimization, agile robotics, and adaptive planning & allocation problems. At Argo AI, he is developing machine learning approaches to motion planning for urban driving.
Title: Infusing Structure and Knowledge into Biomedical AI
Abstract: Artificial intelligence holds tremendous promise in enabling scientific breakthroughs in diverse areas. Biomedical data, however, present unique challenges for scientific discovery, including limited information for supervised learning, the need to generalize to new scenarios not seen during training, and the need for representations that lend themselves to actionable hypotheses in the laboratory. This talk describes our efforts to address these challenges by infusing structure and knowledge into biomedical AI. First, I outline subgraph neural networks that can disentangle distinct aspects of subgraph structure. I will then present a general-purpose approach for few-shot learning on graphs. At the core is the notion of local subgraphs that transfer knowledge from one task to another, even when only a handful of labeled examples are available. This principle is theoretically justified as we show that the evidence for predictions can be found in subgraphs surrounding the targets. Finally, to illustrate the benefits of modeling structure in non-graph datasets, I will introduce Raindrop, a graph neural network that embeds complex time series while also learning the dynamics of sensors purely from observational data. This research creates new avenues for accelerating drug discovery and giving the right patient the right treatment at the right time to have effects that are consistent from person to person and with results in the laboratory.
Bio: Marinka Zitnik (https://zitniklab.hms.harvard.edu) is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Broad Institute of MIT and Harvard, and Harvard Data Science. Her research investigates applied machine learning, focusing on networked systems that require infusing structure and knowledge. Dr. Zitnik has published extensively in ML venues (e.g., NeurIPS, ICLR, ICML) and leading scientific journals (e.g., Nature Methods, Nature Communications, PNAS). She is an ELLIS Scholar in the European Laboratory for Learning and Intelligent Systems (ELLIS) Society and a member of the Science Working Group at NASA Space Biology. Her research won best paper and research awards from the International Society for Computational Biology, Bayer Early Excellence in Science Award, Amazon Faculty Research Award, Roche Alliance with Distinguished Scientists Award, Rising Star Award in Electrical Engineering and Computer Science, and Next Generation in Biomedicine Recognition, being the only young scientist with such recognition in both EECS and Biomedicine.
Title: Using Machine Learning to Improve Clinician Decision Making
Abstract: The next decade will see a shift in focus of machine learning in healthcare from models for diagnosis and prognosis to models that directly guide treatment decisions. We show how to learn treatment policies from electronic medical records, doing a deep dive into our recent work on learning to recommend antibiotics for women with uncomplicated urinary tract infections (Kanjilal et al., Science Translational Medicine '20). We then discuss bigger picture questions for the field, such as how to do rigorous retrospective evaluations, fairly comparing to existing clinical practice, and how to optimally design for clinician-AI interaction, including algorithms that teach humans when and when not to rely on AI (Mozannar et al., AAAI '22). We find that, relative to clinicians, our best models reduce inappropriate antibiotic prescriptions from 11.9% to 9.5% while at the same time using 50% fewer second-line antibiotics.
Bio: David Sontag is Professor in the Department of Electrical Engineering and Computer Science (EECS) at MIT, and member of the Institute for Medical Engineering and Science (IMES) and the Computer Science and Artificial Intelligence Laboratory (CSAIL). Prior to joining MIT, Dr. Sontag was an Assistant Professor in Computer Science and Data Science at New York University from 2011 to 2016, and a postdoctoral researcher at Microsoft Research New England. Dr. Sontag received the Sprowls award for outstanding doctoral thesis in Computer Science at MIT in 2010, best paper awards at the conferences Empirical Methods in Natural Language Processing (EMNLP), Uncertainty in Artificial Intelligence (UAI), and Neural Information Processing Systems (NeurIPS), faculty awards from Google, Facebook, and Adobe, and a National Science Foundation Early Career Award. Dr. Sontag received a B.A. from the University of California, Berkeley.
Title: Continuous Network Models for Sequential Predictions
Abstract: Data-driven machine learning methods such as those based on deep learning are playing a growing role in many areas of science and engineering for modeling time series, including fluid flows, and climate data. However, deep neural networks are known to be sensitive to various adversarial environments, and thus out of the box models and methods are often not suitable for mission critical applications. Hence, robustness and trustworthiness are increasingly important aspects in the process of engineering new neural network architectures and models. In this talk, I am going to view neural networks for time series prediction through the lens of dynamical systems. First, I will discuss deep dynamic autoencoders and argue that integrating physics-informed energy terms into the learning process can help to improve the generalization performance as well as robustness with respect to input perturbations. Second, I will discuss novel continuous-time recurrent neural networks that are more robust and accurate than other traditional recurrent units. I will show that leveraging classical numerical methods, such as the higher-order explicit midpoint time integrator, improves the predictive accuracy of continuous-time recurrent units as compared to using the simpler one-step forward Euler scheme. Finally, I will discuss extensions such as multiscale ordinary differential equations for learning long-term sequential dependencies and a connection between recurrent neural networks and stochastic differential equations.
Bio: Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as head of the Machine Learning and Analytics Group at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, scalable stochastic optimization, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he was on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute's 2016 PCMI Summer Session on The Mathematics of Data, he ran the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he was the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
Title: Deciphering Neural Networks through the Lenses of Feature Interactions
Abstract: Interpreting how neural networks work is a crucial and challenging task in machine learning. In this talk, I will discuss a novel framework, namely neural interaction detector (NID), for interpreting complex neural networks by detecting statistical interactions captured by the neural networks. Furthermore, we can construct a more interpretable generalized additive model that achieves similar prediction performance as the original neural networks. Experiment results on several applications, such as recommender systems, image recognition, sentiment prediction, demonstrate the effectiveness of NID.
Bio: Yan Liu is a Professor in the Computer Science Department and the Director of Machine Learning Center at the University of Southern California. She received her Ph.D. degree from Carnegie Mellon University. Her research interest is machine learning and its applications to climate science, health care and sustainability. She has received several awards, including NSF CAREER Award, Okawa Foundation Research Award, New Voices of Academies of Science, Engineering, and Medicine, Biocom Catalyst Award Winner, ACM Dissertation Award Honorable Mention, Best Paper Award in SIAM Data Mining Conference.
Title: Sequential Decision Making Via Sequence Modeling
Abstract: The ability to make sequential decisions under uncertainty is a key component of intelligence. Despite impressive breakthroughs in deep learning in the last decade, we find that scalable and generalizable decision making has so far been elusive for current AI systems. In this talk, I will propose a new framework for sequential decision making that is derived from modern sequence models for language and perception (e.g., transformers). We will instantiate our framework in 3 different contexts for sequential decision making: offline reinforcement learning (RL), online RL, and black-box optimization, and highlight the simplicity and effectiveness of this unifying framework on a range of challenging high-dimensional benchmarks for sequential decision making.
Bio: Aditya Grover is an assistant professor of computer science at UCLA. His goal is to develop efficient machine learning approaches for probabilistic reasoning under limited supervision, with a focus on deep generative modeling and sequential decision-making under uncertainty. He is also an affiliate faculty at the UCLA Institute of the Environment and Sustainability, where he grounds his research in real-world applications in climate science and sustainable energy. Aditya's 35+ research works have been published at top venues including Nature, deployed into production at major technology companies, and covered in popular press venues. His research has been recognized with two best paper awards, five research fellowships, and the ACM SIGKDD doctoral dissertation award. Aditya received his postdoctoral training at UC Berkeley, PhD from Stanford, and bachelors from IIT Delhi, all in computer science.