Talks of Spring 2023

2:00 - 3:00 pm,  Jan 18, 2023 (EST), Courtney Paquette, McGill University

Title:  Stochastic Algorithms in the Large: Batch Size Saturation, Stepsize Criticality, Generalization Performance, and Exact Dynamics

Slides

Abstract: In this talk, I will present a framework for analyzing dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD+M)) when both the number of samples and dimensions are large. For the analysis, I will introduce a stochastic differential equation, called homogenized SGD. We show that homogenized SGD is the high-dimensional equivalent of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples n and number of features d are polynomially related. By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the training dynamics and generalization performance of SGD in terms of a solution of a Volterra integral equation. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak form of delocalization of sample-side singular vectors of the data. By analyzing these limiting dynamics, we can provide insights into learning rate, momentum parameter, and batch size selection. For instance, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $O(1/ \kappa)$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets.


Bio: Courtney Paquette is an assistant professor at McGill University and a CIFAR Canada AI chair, MILA. Paquette’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science. She received her PhD from the mathematics department at the University of Washington (2017), held postdoctoral positions at Lehigh University (2017-2018) and University of Waterloo (NSF postdoctoral fellowship, 2018-2019), and was a research scientist at Google Research, Brain Montreal (2019-2020).


2:00 - 3:00 pm,  Feb. 08, 2023 (EST), Qin Li, University of Wisconsin Madison

Title:  Multiscale inverse problem: Interaction between multiple scales in physical system reconstruction

Slides Video

Abstract: Inverse problems are ubiquitous. We probe the media with sources and measure the outputs, to infer the media information. At the scale of quantum, classical, statistical and fluid, we face inverse Schroedinger, inverse Newton’s second law, inverse Boltzmann problem, and inverse diffusion respectively. In this talk, we discuss the connection between these problems. In particular, we will provide arguments for justifying these are the same problem merely represented at different scales. However, as one passes from one regime to another, the inverse problem can improve/deteriorate the stability. Tailored to the optimization community, I am hoping to engage my audience with the following question: is it possible to design optimization methods that leverage the well-conditioning of the problem in one regime, and cheaper PDE-solvers at another?

Bio: Prof. Li is an associate professor at UW-Madison mathematics department. She holds an affiliation with The Wisconsin Institutes for Discovery, and is a senior PI of the Institute for Foundations of Data Science. Her research focuses on Kinetic theory, numerical analysis, and scientific computing. Prof. Li received numerous recognitions including NSF Career Award and Simons Fellow.



2:00 - 3:00 pm,  Feb. 15, 2023 (EST), Jiajin Li, Stanford University


Title: Global Convergence Rate Analysis of Nonsmooth Composite Nonconvex-(Non)concave Minimax Optimization


Abstract:  Nonconvex-(Non)concave Minimax Optimization has gained significant attention in both machine learning and operations research fields. However, most current approaches focus on gradient-descent-ascent (GDA) variants, which only work in smooth settings.

This paper take a first step towards developing a new solution that tackles a wider range of structured nonsmooth problems. The proposed smoothed proximal linear descent ascent (\textit{smoothed} PLDA) algorithm is capable of handling objective functions that have a nonsmooth composite structure  in the minimization variable. We also present a novel analysis framework based on the Kurdyka-Lojasiewicz (KL) exponent $\theta \in [0,1)$ of the dual function, which is a departure from the conventional utilization of the KL exponent in pure primal nonconvex optimization. This framework offers insights into the relationship between the decrease in the primal and the increase in the dual, and the trade-off is controlled explicitly through the KL exponent $\theta$. As a result, the iteration complexity is established as $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$. When $\theta \in [0,1/2]$, the smoothed PLDA algorithm achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. Furthermore, we discuss different stationarity concepts and clarify their relationships quantitatively, which could be of independent significance. Finally, our empirical results demonstrate the effectiveness of the proposed smoothed PLDA in solving variation regularized Wasserstein distributionally robust optimization problems.



Bio: Jiajin Li is currently a postdoctoral researcher in the Department of Management Science and Engineering at Stanford University, hosted by Prof. Jose Blanchet. Before that, she obtained her Ph.D. degree from the Department of Systems Engineering and Engineering Management, the Chinese University of Hong Kong (CUHK) in 2021,  supervised by Prof. Anthony Man-Cho So. Her research mainly lies  in the interface of continuous optimization and machine learning, with a primary focus on the algorithmic and theoretical foundations for solving data-driven decision-making problems.


2:00 - 3:00 pm,  Feb. 22, 2023 (EST), Yuhua Zhu, UC San Diego

Title:  Continuous-in-time Limit for Bayesian Bandits

Slides Video

Abstract: This talk revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges to a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit problems with large horizons. Our method has the added benefit that its computational cost does not increase as the horizon increases.


Bio: Yuhua Zhu is an Assistant Professor at the University of California, San Diego, where she holds a joint appointment in the Department of Mathematics and Halicioğlu Data Science Institute (HDSI). Previously, she was a postdoc at Stanford University, and received her Ph.D. in Mathematics from the University of Wisconsin-Madison. Her work mainly focuses on the interface of PDE and machine learning, including using PDE to provide theoretical interpretation to existing stochastic optimization algorithms and reinforcement learning algorithms, and using numerical techniques to improve machine learning algorithms.


2:00 - 3:00 pm,  March 15, 2023 (EST), Chris De Sa, Cornell University

Title:  Better Data Ordering for Stochastic Gradient Descent

Slides Video

Abstract: Training example order in SGD has long been known to affect convergence rate. Recent results show that accelerated rates are possible in a variety of cases for permutation-based sample orders, in which each example from the training set is used once before any example is reused. This talk will cover a line of work in my lab on sample-ordering schemes. We will discuss the limits of the classic random-reshuffling scheme and explore how it can be improved (both in terms of theoretical rates and empirical performance) to make SGD converge faster both in theory and in practice with little overhead relative to random reshuffling. The talk will conclude with some ongoing work from my lab on how this better-than-random approach can be applied to large-scale distributed learning problems.

 

Bio: Chris De Sa is an Assistant Professor in the Computer Science department at Cornell University. He is a member of the Cornell Machine Learning Group and leads the Relax ML Lab. His research interests include algorithmic, software, and hardware techniques for high-performance machine learning, with a focus on relaxed-consistency variants of stochastic algorithms such as asynchronous and low-precision stochastic gradient descent (SGD) and Markov chain Monte Carlo. The RelaxML lab builds towards using these techniques to construct data analytics and machine learning frameworks, including for deep learning, that are efficient, parallel, and distributed.



2:00 - 3:00 pm,  March 22, 2023 (EST), Melanie Weber, Harvard University

Title:  Exploiting geometric structure in matrix-valued optimization

Abstract: Many applications involve non-Euclidean data, such as graphs, strings, or matrices. Exploiting such geometric structure in optimization can deliver algorithms that are computationally superior to standard nonlinear programming approaches. This observation has resulted in an increasing interest in Riemannian methods in the optimization and machine learning community. In this talk, we consider the problem of optimizing a function on a (Riemannian) manifold subject to convex constraints. Several classical problems that arise in Machine Learning applications can be phrased as constrained optimization on manifolds. This includes matrix-valued tasks, such as barycenter problems, as well as the computation of Brascamp-Lieb constants. The latter is of central importance in many areas of mathematics and computer science through connections to maximum likelihood estimators in Gaussian models, Tyler’s M-estimator of scatter matrices and Operator Scaling. We introduce Riemannian Frank-Wolfe methods, a class of first-order methods for solving constrained optimization problems on manifolds and present a global, non-asymptotic convergence analysis. We further discuss a class of CCCP-style algorithms for Riemannian “difference of convex” functions and explore connections to constrained optimization. For both methods we discuss applications to the two problems described above. Based on joint work with Suvrit Sra.

Bio: Melanie is an Assistant Professor of Applied Mathematics and of Computer Science at Harvard. Her research focuses on utilizing geometric structure in data for the design of efficient Machine Learning and Optimization methods. In 2021-2022, she was a Hooke Research Fellow at the Mathematical Institute in Oxford. Previously, she received her PhD from Princeton University (2021) under the supervision of Charles Fefferman, held visiting positions at MIT and the Simons Institute in Berkeley and interned in the research labs of Facebook, Google and Microsoft. In addition to her academic work, she is the Chief Scientist of the startup Claudius Legal Intelligence, where she leads a team of researchers in developing Trustworthy Machine Learning tools for legal analytics.


2:00 - 3:00 pm,  March 29, 2023 (EST), Ben Grimmer, Johns Hopkins

Title:  Scalable, Projection-Free Optimization Methods

Slides Video

Abstract: We will introduce an approach to constrained optimization replacing orthogonal projections with much cheaper radial ones. This results in new first-order methods that are (i) scalable, sporting minimal per iteration costs, (ii) always yield fully feasible solutions, and (iii) applicable to a wide family of potentially nonconvex constraints. For example, when applied to linear programs, iteration costs amount to one matrix multiplication instead of expensive pivoting or matrix inverses. For semidefinite programming, iterations require computing one eigenvector, avoiding expensive full decompositions. Beyond these classic settings, modern applications with nonconvex objectives and constraints will be discussed. The key mathematical insight underlying these methods is a new duality relating constrained optimization problems to unconstrained "radially dual" problems. From this, we present accelerated projection-free first-order methods using smoothness, strong convexity, or other common structures in the primal objective or constraints.


Bio: Benjamin Grimmer is an Assistant Professor of Applied Math and Statistics at Johns Hopkins University. He completed his PhD in Operations Research at Cornell, advised by Jim Renegar and Damek Davis, and funded by an NSF fellowship. His work focuses on the design and analysis of new optimization methods beyond the scope of classic approaches, scaling past typical bottlenecks and avoiding unrealistic assumptions common in modern applications. Additionally, Ben is a 3D printing enthusiast, creating printable mathematical models for use in teaching and references for use in research talks (including this one)


2:00 - 3:00 pm,  April 5, 2023 (EST), Yongxin Chen,  Georgia Tech

Title:  A proximal algorithm for sampling 

Slides Video

The sampling problem to draw random samples from an unnormalized distribution plays an important role in many areas such as Bayesian inference and computational biology. In this work, we present new results on the proximal sampler, a recent method introduced by Lee, Shen, and Tian in 2021. The proximal sampler can be viewed as a sampling analog of the proximal point method in optimization and was proved to converge when the target distribution is strongly log-concave, assuming access to the restricted Gaussian oracle (RGO). We briefly discuss improved analysis of the proximal sampler that operates under weaker assumptions than previous works. We present an efficient implementation of the RGO when the target log density is neither convex nor smooth. Our implementation is accomplished using (approximate) rejection sampling with a delicate proposal. Combining our RGO implementation and our improved convergence analysis we obtain a sampling algorithm that has state-of-the-art sampling guarantees in all classical settings, including (strongly) log-concave, Logarithmic Sobolev inequality, Poincare inequality. This is joint work with Jiaojiao Fan, Bo Yuan, Jiaming Liang, Sinho Chewi, Adil Salim, and Andre Wibisono.


Bio: Yongxin Chen is an Assistant Professor in the School of Aerospace Engineering at Georgia Institute of Technology. He has served on the faculty at Iowa State University (2017-2018). Prior to that, he spent one year (2016-2017) at the Memorial Sloan Kettering Cancer Center (MSKCC) as a postdoctoral fellow. He received his BSc from Shanghai Jiao Tong University in 2011, and Ph.D. from University of Minnesota in 2016, both in Mechanical Engineering. He is an awardee of the George S. Axelby Best Paper Award of IEEE Transactions on Automatic Control in 2017 and the best paper prize of SIAM Journal on Control and Optimization in 2023. He received the NSF Faculty Early Career Development Program (CAREER) Award in 2020, the Simons-Berkeley Research Fellowship in 2021, the A. V. `Bal’ Balakrishnan Award in 2021, and the Donald P. Eckman Award for outstanding young engineer in the field of automatic control in 2022. His current research interests are in the areas of control theory, optimization, machine learning, and robotics. He enjoys developing new algorithms and theoretical frameworks for real world applications.



2:00 - 3:00 pm,  April 19, 2023 (EST), Mingyi Hong, University of Minnesota

Title:  Bilevel Optimization: Stochastic Algorithms and Applications in Inverse Reinforcement Learning

Slides Video

Abstract: Bilevel Optimization is a class of optimization problem that has two levels of nested  optimization subproblems. It can be used to model applications arising in areas such as signal processing, machine learning, game theory, etc. In this talk, we will first discuss several recent works, including a few of our own works, which develop efficient stochastic algorithms for this class of problems. These works together provide a set of useful tools for the practitioners to customize for different application domains. In the second part of this talk, we will dive deeper into a recent application of the bilevel optimization – the inverse reinforcement learning (IRL) problem. The IRL aims to recover the structure of an agent’s reward (and sometimes the environment dynamics) that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Being able to learn accurate models of expertise (i.e., the reward model) from observation data has applications in safety-sensitive applications such as clinical decision making and autonomous driving. In this work, we propose a new formulation of the reward model estimation task, which happens to take the bilevel optimization form. We then propose a set of efficient algorithms to solve the IRL problem, and provide statistical and computational guarantees of performance for the associated reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.

 

 

Bio:  Mingyi Hong (mhong@umn.edu) received his Ph.D. degree from the University of Virginia, Charlottesville, in 2011. He is currently an Associate professor in the Department of Electrical and Computer Engineering at the University of Minnesota, Minneapolis. His research has been focused on developing optimization theory and algorithms for applications in signal processing and machine learning.

 

He is an Associate Editor for IEEE Transactions on Signal Processing. In the past, he has also served on the IEEE Machine Learning for Signal Processing Technical Committee, and IEEE Signal Processing for Communications and Networking Technical Committee.  His work has received two IEEE Signal Processing Society (SPS) Best Paper Awards (2021, 2022), an International Consortium of Chinese Mathematicians Best Paper Award (2020), and a few Best Student Paper Awards in signal processing and machine learning conferences. He is an Amazon Scholar, and he is the recipient of an IBM Faculty Award, a Meta Research Award, and the 2022 Pierre-Simon Laplace Early Career Technical Achievement Award from IEEE SPS.


2:00 - 3:00 pm,  April 26, 2023 (EST), Abiy Tassisa, Tufts University

Title:  Sparse coding via a Delaunay triangulation

Slides Video

Abstract: Sparse coding is a technique of representing data as a sparse linear combination of a set of vectors. This representation facilitates computation and analysis of high-dimensional data that is prevalent in many applications. We study sparse coding in the setting where the set of vectors define a unique Delaunay triangulation. We propose a weighted l1 regularizer and show that it provably yields a sparse solution. Further, we show stability of sparse codes using the Cayley-Menger determinant. We make connections to dictionary learning, manifold learning and computational geometry. We discuss an optimization algorithm to learn the sparse codes and optimal set of vectors given a set of data points. Finally, we show numerical experiments to show that the resulting sparse representations give competitive performance in the problem of clustering.  


Bio: Abiy Tasissa received B.Sc. in Mathematics and M.Sc. in Aeronautics and Astronautics from the Massachusetts Institute of Technology and a PhD in Applied Mathematics from the Rensselaer Polytechnic Institute. He is currently an assistant professor in the Department of Mathematics at Tufts University. His research interests include distance geometry, compressive sensing, matrix completion and manifold learning.

2:00 - 3:00 pm,  May 3, 2023 (EST), Zhou-cheng Xiao, New York University

Title:  Efficient models of the cortex via coarse-grained interactions and local response functions.

Slides Video

Abstract: Modeling the human cortex is challenging due to its structural and dynamic complexity. Spiking neuron models can incorporate many details of cortical circuits, but are computationally costly and difficult to scale up, limiting their scope to small patches of cortex and restricting the range of phenomena that can be studied. Alternatively, one can use simpler phenomenological models, which are easier to build and run but are more difficult to compare directly to experimental data. This talk presents an efficient modeling strategy that aims to strike a balance between biological realism and computational efficiency.

The proposed modeling strategy combines a coarse-grained representation with local circuit dynamics to compute the steady-state cortical response to external stimuli. A crucial observation is that as a consequence of anatomical structures and the nature of neuronal interactions, potential local responses can be computed independently of dynamics on the coarse-grained level. We first precompute a library of steady-state local responses driven by possible lateral and external input. Then, the fixed point of the coarse-grained model can be computed by an iterative scheme combined with fast library lookups.

Our approach is tested on a model of primate primary visual cortex (V1) and successfully captures essential V1 features such as orientation selectivity. Time permitting, I will also discuss a related project in which we devised an efficient way to explore the parameter space of a primate V1 model, identifying the set of viable parameters as a "thickened" codimension-1 submanifold of parameter space.

OrganizersRongjie Lai ,     Rensselaer Polytechnic Institute

                      John MitchellRensselaer Polytechnic Institute

                      Yangyang Xu,   Rensselaer Polytechnic Institute