Machine Learning Seminar Series

Aug. 12th

Speaker: Yu Xiang

  • Electrical and Computer Engineering ,University of Utah

  • Bio : Yu Xiang is an Assistant Professor in Electrical and Computer Engineering at the University of Utah since July 2018. Prior to this, he was a postdoctoral fellow in Harvard John A. Paulson School of Engineering and Applied Sciences at Harvard University. He obtained his Ph.D. in Electrical and Computer Engineering from the University of California, San Diego in 2015. I received my B.E. with the highest distinction from the School of Telecommunications Engineering at Xidian University, Xi'an, China, in 2008. His current research interests include statistical signal processing, information theory, machine learning, and their applications to neuroscience and computational biology.

Talk information

  • Title: Causal Inference from Slowly Varying Nonstationary Processes

  • Time: Thursday, Aug. 12th, 2021 12:301:30 pm

  • Location: Online via zoom (join)

Abstract

Diffusion source identification on networks is a problem of fundamental importance in a broad class of applications, including rumor controlling and virus identification. Though this problem has received significant recent attention, most studies have focused only on very restrictive settings and lack theoretical guarantees for more realistic networks. We introduce a statistical framework for the study of diffusion source identification and develop a confidence set inference approach inspired by hypothesis testing. Our method efficiently produces a small subset of nodes, which provably covers the source node with any pre-specified confidence level without restrictive assumptions on network structures. Moreover, we propose multiple Monte Carlo strategies for the inference procedure based on network topology and the probabilistic properties that significantly improve the scalability. To our knowledge, this is the first diffusion source identification method with a practically useful theoretical guarantee on general networks. We demonstrate our approach via extensive synthetic experiments on well-known random network models and a mobility network between cities concerning the COVID-19 spreading. This is joint work with Quilan Dawkins and Haifeng Xu at UVA.

Aug. 5th

Speaker: Tianxi Li

  • Department of Statistics, University of Virginia

  • Bio : Tianxi Li is currently an assistant professor in the Department of Statistics at the University of Virginia. He obtained his Ph.D. from the University of Michigan in 2018. His research is mainly about statistical machine learning and statistical network analysis.

Talk information

  • Title: Diffusion Source Identification on Networks with Statistical Confidence

  • Time: Thursday, Aug. 5th, 2021 11:3012:30 pm

  • Location: Online via zoom (join)

Abstract

Diffusion source identification on networks is a problem of fundamental importance in a broad class of applications, including rumor controlling and virus identification. Though this problem has received significant recent attention, most studies have focused only on very restrictive settings and lack theoretical guarantees for more realistic networks. We introduce a statistical framework for the study of diffusion source identification and develop a confidence set inference approach inspired by hypothesis testing. Our method efficiently produces a small subset of nodes, which provably covers the source node with any pre-specified confidence level without restrictive assumptions on network structures. Moreover, we propose multiple Monte Carlo strategies for the inference procedure based on network topology and the probabilistic properties that significantly improve the scalability. To our knowledge, this is the first diffusion source identification method with a practically useful theoretical guarantee on general networks. We demonstrate our approach via extensive synthetic experiments on well-known random network models and a mobility network between cities concerning the COVID-19 spreading. This is joint work with Quilan Dawkins and Haifeng Xu at UVA.

July 29th

Speaker: Chiyuan Zhang

  • Google Brain

  • Bio : Chiyuan Zhang is a research scientist at Google Research, Brain Team. He is interested in analyzing and understanding the foundations behind the effectiveness of deep learning, as well as its connection to the cognition and learning mechanisms of the human brain. Chiyuan Zhang holds a Ph.D. from MIT (2017, advised by Tomaso Poggio), and a Bachelor (2009) and a Master (2012) degrees in computer science from Zhejiang University, China. His work was recognized by INTERSPEECH best student paper award in 2014, and ICLR best paper award in 2017.

Talk information

  • Title: Characterizing Structural Regularities of Labeled Data in Overparameterized Models

  • Time: Thursday, July 29th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how individual instances are treated by a model via a consistency score. The score characterizes the expected accuracy for a held-out instance given training sets of varying size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We identify computationally inexpensive proxies to the consistency score using statistics collected during training. We show examples of potential applications to the analysis of deep-learning systems.

Speaker: Xiwei Tang

  • Department of Statistics, University of Virginia

  • Bio :

Talk information

  • Title: Multivariate Temporal Point Process Regression with Applications in Calcium Imaging Analysis

  • Time: Thursday, July 22th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Point process modeling is gaining increasing attention, as point process type data are emerging in a large variety of scientific applications. In this article, motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. We develop a highly scalable optimization algorithm for parameter estimation. We derive the large sample error bound for the recovered coefficient tensor, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

July 15th

Speaker: Zhaoran Wang

  • Departments of Industrial Engineering & Management Sciences and Computer Science at Northwestern University

  • Bio : Zhaoran Wang is an assistant professor at Northwestern University, working at the interface of machine learning, statistics, and optimization. He is the recipient of the AISTATS (Artificial Intelligence and Statistics Conference) notable paper award, Microsoft Ph.D. Fellowship, Simons-Berkeley/J.P. Morgan AI Research Fellowship, Amazon Machine Learning Research Award, and NSF CAREER Award.

Talk information

  • Title: Demystifying (Deep) Reinforcement Learning with Optimism and Pessimism

  • Time: Thursday, July 15th, 2021 12:001:00 pm

  • Location: Online via zoom (join)

Abstract

Coupled with powerful function approximators such as deep neural networks, reinforcement learning (RL) achieves tremendous empirical successes. However, its theoretical understandings lag behind. In particular, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In this talk, we will present the two sides of the same coin, which demonstrates an intriguing duality between optimism and pessimism.

– In the online setting, we aim to learn the optimal policy by actively interacting with the environment. To strike a balance between exploration and exploitation, we propose an optimistic least-squares value iteration algorithm, which achieves a \sqrt{T} regret in the presence of linear, kernel, and neural function approximators.

– In the offline setting, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares value iteration algorithm, which achieves a minimax-optimal sample complexity.

July 8th

Speaker: Rohan Anil

  • Google Brain

  • Bio : Rohan Anil is a Senior Staff Software Engineer, Google Research, Brain Team. Lately, he has been working on scalable and practical optimization techniques for efficient training of neural networks in various regimes.

Talk information

  • Title: Scalable Second-Order Optimization for Deep Learning

  • Time: Thursday, July 8th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. In an attempt to bridge this gap between theoretical and practical optimization, we present a scalable implementation of a second-order preconditioned method (concretely, a variant of full-matrix Adagrad), that along with several critical algorithmic and numerical improvements, provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models. Our novel design effectively utilizes the prevalent heterogeneous hardware architecture for training deep models, consisting of a multicore CPU coupled with multiple accelerator units. We demonstrate superior performance compared to state-of-the-art on very large learning tasks such as machine translation with Transformers, language modeling with BERT, click-through rate prediction on Criteo, and image classification on ImageNet with ResNet-50.


References:

https://arxiv.org/abs/2002.09018

https://arxiv.org/abs/1901.11150

https://arxiv.org/abs/2106.06199

July 1st

Speaker: Brian Kulis

  • Department of Electrical and Computer Engineering, Boston University

  • Bio : Brian Kulis is an associate professor at Boston University, with appointments in the Department of Electrical and Computer Engineering, the Department of Computer Science, the Faculty of Computing and Data Sciences, and the Division of Systems Engineering. He also is an Amazon Scholar, working with the Alexa team. Previously he was the Peter J. Levine Career Development assistant professor at Boston University. Before joining Boston University, he was an assistant professor in Computer Science and in Statistics at Ohio State University, and prior to that was a postdoctoral fellow at UC Berkeley EECS. His research focuses on machine learning, statistics, computer vision, and large-scale optimization. He obtained his PhD in computer science from the University of Texas in 2008, and his BA degree from Cornell University in computer science and mathematics in 2003. For his research, he has won three best paper awards at top-tier conferences---two at the International Conference on Machine Learning (in 2005 and 2007) and one at the IEEE Conference on Computer Vision and Pattern Recognition (in 2008). He is also the recipient of an NSF CAREER Award in 2015, an MCD graduate fellowship from the University of Texas (2003-2007), and an Award of Excellence from the College of Natural Sciences at the University of Texas.

Talk information

  • Title: New Directions in Metric Learning

  • Time: Thursday, July 1st, 2021 12:001:00 pm

  • Location: Online via zoom (join) (video)

Abstract

Metric learning is a supervised machine learning problem concerned with learning a task-specific distance function from supervised data. It has found numerous applications in problems such as similarity search, clustering, and ranking. Much of the foundational work in this area focused on the class of so-called Mahalanobis metrics, which may be viewed as Euclidean distances after linear transformations of the data. This talk will describe two recent directions in metric learning: deep metric learning and divergence learning. The first replaces the linear transformations with the output of a neural network, while the second considers a broader class than Mahalanobis metrics. I will discuss some of my recent work along both of these fronts, as well as ongoing attempts to combine these approaches together using a novel framework called deep divergences.

Jun. 24th

Speaker: Michael Overton

  • Courant Institute of Mathematical Sciences, NYU

  • Bio : Michael L. Overton is Silver Professor of Computer Science and Mathemat-ics at the Courant Institute of Mathematical Sciences, New York University. He received his B.Sc. in Computer Science from the University of British Columbia in 1974 and his Ph.D. in Computer Science from Stanford University in 1979. He is a Fellow of SIAM (Society for Industrial and Applied Mathematics) and of the IMA (Institute of Mathematics and its Applications, UK). He served on the Council and Board of Trustees of SIAM from 1991 to 2005, including a term as Chair of the Board from 2004 to 2005. He served as Editor-in-Chief of SIAM Journal on Optimization from 1995 to 1999 and of the IMA Journal of Numerical Analysis from 2007 to 2008, and was the Editor-in-Chief of the MPS(Mathematical Programming Society)-SIAM joint book series from 2003 to 2007. He is currently an editor of SIAM Journal on Matrix Analysis and Applications, IMA Journal of Numerical Analysis, Foundations of Computational Mathematics, and Numerische Mathematik. His research interests are at the interface of optimization and linear algebra, especially nonsmooth optimization problems involving eigenvalues, pseudospectra, stability and robust control. He is the author of Numerical Computing with IEEE Floating Point Arithmetic (SIAM, 2001).

Talk information

  • Title: Nonsmooth, Nonconvex Optimization: Algorithms and Examples

  • Time: Thursday, Jun. 24th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (videos) (sildes)

Abstract

In many applications one wishes to minimize an objective function that is not convex and is not differentiable at its minimizes. We discuss two algorithms for minimization of general nonsmooth, nonconvex functions. Gradient Sampling is a simple method that, although computationally intensive, has a nice convergence theory. The method is robust and the convergence theory has been extended to constrained problems. BFGS is a well known method, developed for smooth problems, but which is remarkably effective for nonsmooth problems too. Although our theoretical results in the nonsmooth case are quite limited, we have made extensive empirical observations and have had broad success with BFGS in nonsmooth applications. Limited Memory BFGS is a popular extension for large-scale problems, but we show that, in contrast to BFGS, it sometimes converges to non-optimal nonsmooth points. Throughout the talk we illustrate the ideas through examples, some very easy and some very challenging.

Jun. 17th

Speaker: mahdi soltanolkotabi

  • Departments of Electrical and Computer Engineering and Computer Science, USC

  • Bio : Mahdi Soltanolkotabi is an associate professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and a Google faculty research award.

Talk information

  • Title: Overparameterized learning beyond the lazy regime

  • Time: Thursday, Jun. 17th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (video)

Abstract

Modern learning models are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Due to over-parameterization these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, these models trained via first-order methods continue to predict well on yet unseen test data. In this talk I aim to demystify this phenomena in two different problems: (1) The first problem focuses on overparametrization in Generative Adversarial Networks (GANs). A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this part of the talk I will present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. (2) The second problem focuses on overparameterized learning in the context of low-rank reconstruction from a few measurements. For this problem I will show that despite the presence of many global optima gradient descent from small random initialization converges to a generalizable solution and finds the underlying low-rank matrix. Notably this analysis is not in the “lazy” training regime and is based on an intriguing phenomena uncovering the critical role of small random initialization: a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well.

Jun. 10th

Speaker: Weijie Su

  • Wharton Statistics Department, University of Pennsylvania

  • Bio : Weijie Su is an Assistant Professor in the Wharton Statistics Department and in the Department of Computer and Information Science, at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. from Stanford University in 2016 and his bachelor’s degree from Peking University in 2011. His research interests span machine learning, optimization, privacy-preserving data analysis, and high-dimensional statistics. He is a recipient of the Stanford Theodore Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, and an Alfred Sloan Research Fellowship in 2020.

Talk information

  • Title: Local Elasticity: A Phenomenological Approach Toward Understanding Deep Learning

  • Time: Thursday, Jun. 10th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (slides) (video)

Abstract

Motivated by the iterative nature of training neural networks, we ask: If the weights of a neural network are updated using the induced gradient on an image of a tiger, how does this update impact the prediction of the neural network at another image (say, an image of another tiger, a cat, or a plane)? To address this question, I will introduce a phenomenon termed local elasticity. Roughly speaking, our experiments show that modern deep neural networks are locally elastic in the sense that the change in prediction is likely to be most significant at another tiger and least significant at a plane, at late stages of the training process. I will illustrate some implications of local elasticity by relating it to the neural tangent kernel and improving on the generalization bound for uniform stability. Moreover, I will introduce a phenomenological model for simulating neural networks, which suggests that local elasticity may result from feature sharing between semantically related images and the hierarchical representations of high-level features. Finally, I will offer a local-elasticity-focused agenda for future research toward a theoretical foundation for deep learning.

May. 27th

Speaker: Yuehaw Khoo

  • Department of Statistics, U of Chicago

  • Bio : Yuehaw Khoo is an assistant professor in the statistics department of University of Chicago. Prior to this, he was a post-doc in Stanford and graduate student in Princeton. He is interested in scientific computing problems in protein structure determination and many-body physics.

Talk information

  • Title: Solving PDEs with Deep Learning

  • Time: Thursday, May. 27th, 2021 12:001:00 pm

  • Location: Online via zoom (join) (video)

Abstract

Deep neural-network provides an alternative method for compressing high-dimensional functions arising from partial differential equations (PDE). In this talk, we focus on using artificial neural-networks for solving PDEs in two ways: (1) Using neural-networks to represent mappings between PDE coefficients and solutions. (2) Constructing a solution space with neural-networks when solving for a PDE, and obtaining the neural-network parameterized solution via optimization. We apply the methods in scattering problems and when studying transition between states in stochastic systems.