Talks of Spring 2021

4:00 - 5:00 pm, Jan 27, 2021 (EST), Necdet Serhat Aybat, Penn State University

Title: A Primal-Dual Algorithm with Line Search for Convex Concave Saddle Point Problems

Video Slides

Abstract: In this talk, I will consider a saddle point problem minx maxy L(x,y) where L is a convex-concave function of the form L(x,y) = f(x) + Φ(x,y)−h(y) with a general coupling term Φ that is not assumed to be bilinear. I will present an accelerated primal-dual (APD) algorithm, which employs a novel momentum term involving the partial gradients of the coupling function. APD can be viewed as a generalization of the method proposed by Chambolle and Pock in 2016 which can only deal with bilinear Φ. Under certain smoothness assumption on Φ, we show that the iterate sequence converges to a saddle point; and for any (x,y), we derive error bounds in terms of L(xk,y)−L(x,yk) for the ergodic sequence {xk, yk}. In particular, we show O(1/k) rate when L is merely convex in x. Furthermore, assuming Φ(x,·) is linear for each fixed x and f is strongly convex, we obtain the ergodic convergence rate of O(1/k2) – to the best of our knowledge, APD is the first single-loop method in the related literature achieving this rate when Φ is not bilinear. Finally, I will discuss a backtracking technique which does not require the knowledge of Lipschitz constants while ensuring the same convergence results.

In the later part of this talk, I will customize these results for convex optimization problems with nonlinear functional constraints and show that using the backtracking scheme, the optimal convergence rate can be achieved even when the dual domain is unbounded. We tested our method against other state-of-the-art first-order algorithms for solving quadratically constrained quadratic problems (QCQP): in the first set of experiments, we considered QCQPs with synthetic data, and in the second set, we focused on QCQPs with real data originating from a variant of the linear regression problem with fairness constraints arising in machine learning.

This is a joint work with my former PhD student Erfan Yazdandoost Hamedani.


Bio: Necdet Serhat Aybat is an Associate Professor in the Department of Industrial and Manufacturing Engineering at Pennsylvania State University. He received his Ph.D. degree in operations research from Columbia University, New York, USA in 2011, and his M.S. and B.S. degrees in industrial engineering from Bogazici University, Istanbul, Turkey, in 2005 and 2003, respectively. Aybat’s research mainly focuses on first-order methods for large-scale constrained optimization problems from diverse application areas, such as machine learning, distributed optimization, robust matrix decomposition, convex regression, and compressed sensing. His current research involves design and analysis of randomized primal-dual algorithms for solving saddle point problems, and of decentralized methods for multi-agent optimization with complicated coupling constraints. His work has been supported by NSF, ARO and ONR research grants.

4:00 - 5:00 pm, Feb 03, 2021 (EST), Qihang Lin, University of Iowa

Title: First-Order Methods for Convex Constrained Optimization under Error Bound Conditions

Video Slides

Abstract: Large-scale constrained optimization problems arise in several application domains. First-order methods are good candidates to tackle such problems due to their low iteration complexity and memory requirement. In this work, a new class of first-order methods based on a level-set technique is developed for convex constrained optimization that satisfies an error bound condition with unknown growth parameters. The proposed approach solves the original problem by solving a sequence of unconstrained subproblems defined with different level parameters. Different from the existing methods where the subproblems are solved sequentially, our method applies a first-order method to solve each subproblem independently and simultaneously, which can be implemented with either a single or multiple processors. Once the objective value of one subproblem is reduced by a constant factor, a sequential restart is performed to update the level parameters and restart the first-order methods. Compared to existing approaches, the proposed method has lower theoretical complexity and does not require knowing any problem-dependent parameters such as the growth rates and the Lipschitz constants of the objective and constraint functions.


Bio: Qihang Lin is an associate professor in the Business Analytics Department at the University of Iowa. He studied in Tsinghua University (B.S. in Math 2008 ) in China and obtained his Ph.D. in ACO (Algorithm Combinatorics and Optimization) in 2013 from Tepper School of Business at Carnegie Mellon University. Dr. Lin’s research fields involve large-scale optimization with applications in machine learning and Markov decision process.

4:00 - 5:00 pm, Feb 10, 2021 (EST), Alex Cloninger, UC San Diego

Title: Incorporating Invariance to Reduce the Complexity of Parametric Models

Video Slides

Abstract: Many scientific problems involve invariant structures, and learning functions that rely on a much lower dimensional set of features than the data itself. Incorporating these invariances into a parametric model can significantly reduce the model complexity, and lead to a vast reduction in the number of labeled examples required to estimate the parameters. We display this benefit in two settings. The first setting concerns ReLU networks, and the size of networks and number of points required to learn certain functions and classification regions. Here, we assume that the target function has built in invariances, namely that it only depends on the projection onto a very low dimensional, function defined manifold (with dimension possibly significantly smaller than even the intrinsic dimension of the data). We use this manifold variant of a single or multi index model to establish network complexity and ERM rates that beat even the intrinsic dimension of the data. We should note that corollary of this result is developing intrinsic rates for a manifold plus noise data model without needing to assume the distribution of the noise decays exponentially. The second setting for building invariances concerns linearized optimal transport (LOT), and using it to build supervised classifiers on distributions. Here, we construct invariances to families of group actions (e.g., shifts and scalings of a fixed distribution), and show that LOT can learn a classifier on group orbits using a simple linear separator. We demonstrate the benefit of this on MNIST by constructing robust classifiers with only a small number of labeled examples. This talk covers joint work with Timo Klock and Caroline Moosmueller.


Bio: Alex Cloninger is an Assistant Professor in Mathematics and the Halıcıoğlu Data Science Institute at UC San Diego. Alex researches problems in the area of geometric data analysis and applied harmonic analysis. He focuses on approaches that model the data as being locally lower dimensional, including data concentrated near manifolds or subspaces. These types of problems arise in a number of scientific disciplines, including imaging, medicine, and artificial intelligence, and the techniques developed relate to a number of machine learning and statistical algorithms, including deep learning, network analysis, and measuring distances between probability distributions. He received his PhD in Applied Mathematics and Scientific Computation from the University of Maryland in 2014, and was then an NSF Postdoc and Gibbs Assistant Professor of Mathematics at Yale University until 2017, when he joined UCSD.

4:00 - 5:00 pm, Feb 17, 2021 (EST), Junyu Zhang, Princeton University

Title: On Lower Iteration Complexity Bounds for the Convex Concave Saddle Point Problems

Video Slides

Abstract: In this talk, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal mappings. Under several special parameter regimes, our lower bound has been achieved by corresponding optimal algorithms. However, whether or not the bound under the general parameter regime is optimal remains open.

4:00 - 5:00 pm, Feb 24, 2021 (EST), Matthew Hirn, Michigan State University

Title: Understanding convolutional neural networks through signal processing: From signals to manifolds to graphs

Video Slides

Abstract: Convolutional neural networks (CNNs) are the go-to tool for image processing tasks in machine learning. But how and why do they work so well? Using the basic guiding principles of CNNs, namely their convolutional structure, invariance properties, and multi-scale nature, we will discuss how the CNN architecture arises as a natural bi-product of these principles using the language of nonlinear signal processing. In doing so we will extract some core ideas that allow us to generalize these architectures to manifolds and graphs, while still being able to provide mathematical guarantees on the nature of the representation provided by these tools.


Bio: Matthew Hirn is an Associate Professor in the Department of Computational Mathematics, Science & Engineering and the Department of Mathematics at Michigan State University. At Michigan State he is the scientific leader of the ComplEx Data Analysis Research (CEDAR) team, which develops new tools in computational harmonic analysis, machine learning, and data science for the analysis of complex, high dimensional data. Hirn received his B.A. in Mathematics from Cornell University and his Ph.D. in Mathematics from the University of Maryland, College Park. Before arriving at MSU, he held postdoctoral appointments in the Applied Math Program at Yale University and in the Department of Computer Science at Ecole Normale Superieure, Paris. He is the recipient of the Alfred P. Sloan Fellowship (2016), the DARPA Young Faculty Award (2016), the DARPA Director’s Fellowship (2018), and the NSF CAREER award (2019), and was designated a Kavli Fellow by the National Academy of Sciences (2017).

4:00 - 5:00 pm, March 03, 2021 (EST), Ying Cui, University of Minnesota

Title: Solving Non-(Clarke)-Regular Optimization Problems in Statistical Learning

Slides

Abstract: Although we have witnessed growing interests from the continuous optimization community in solving the nonconvex and nonsmooth optimization problems, most of the existing work focus on the Clarke regular objective or constraints so that many friendly properties hold. In this talk, we will discuss the pervasiveness of the non-(Clarke) regularity in the modern operations research and statistical estimation problems due to the complicated composition of nonconvex and nonsmooth functions. Emphasis will be put on the difficulties brought by the non-regularity both in terms of the computation and the statistical inference, and our initial attempts to overcome them.


Bio: Ying Cui is currently an assistant professor of the Department of Industrial and Systems Engineering at the University of Minnesota. Her research focuses on the mathematical foundation of data science with emphasis on optimization techniques for operations research, machine learning and statistical estimations. Prior to UMN, she was a postdoc research associate at the University of Southern California. She received her Ph.D from the Department of Mathematics at the National University of Singapore.

4:00 - 5:00 pm, March 10, 2021 (EST), Alex Olshevsky, Boston University

Title: Network Independence in Distributed Convex Optimization

Video Slides

Abstract: Distributed optimization has attracted a lot of attention recently as machine learning methods are increasingly trained in parallel over clusters of nodes. Unfortunately, the performance of many distributed optimization algorithms often suffers from scalability problems, with worst-case performance degrading the size of the network increases.

We present an overview of several recent works that manage to overcome this barrier. We will describe distributed first-order methods for convex optimization with associated performance bounds that do not depend on the network at all after a transient period. The final result is that a network of n nodes is able to converge to a global minimizer of the objective n times faster as compared to a single node.

Bio: Alex Olshevsky received the B.S. degree in applied mathematics and the B.S. degree in electrical engineering from the Georgia Institute of Technology, Atlanta, GA, USA, both in 2004, and the M.S. and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 2006 and 2010, respectively. He was a postdoctoral scholar at Princeton University from 2010 to 2012, and an Assistant Professor at the University of Illinois at Urbana-Champaign from 2012 to 2016. He is currently an Associate Professor with the ECE department at Boston University.


Dr. Olshevsky is a recipient of the NSF CAREER Award, the Air Force Young Investigator Award, the INFORMS Computing Society Prize for the best paper on the interface of operations research and computer science, a SIAM Award for annual paper from the SIAM Journal on Control and Optimization chosen to be reprinted in SIAM Review, and an IMIA award for best paper on clinical medical informatics in 2019.


4:00 - 5:00 pm, March 17, 2021 (EST), Yunan Yang, Courant Insitiute, New York University

Title: Optimal Transport for Inverse Problems and the Implicit Regularization

Video Slides

Abstract: Optimal transport has been one interesting topic of mathematical analysis since Monge (1781). The problem's close connections with differential geometry and kinetic descriptions were discovered within the past century, and the seminal work of Kantorovich (1942) showed its power to solve real-world problems. Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for inverse problems, tackling the classical least-squares method's longstanding difficulties such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of data-fitting problems by examining the preconditioning and "implicit" regularization effects of different mathematical metrics as the objective function in optimization, as the likelihood function in Bayesian inference, and as the measure of residual in numerical solution to PDEs.


Bio: Yunan Yang joined NYU in 2018 as a Courant Instructor, after earning her Ph.D. in Mathematics under the supervision of Dr. Bjorn Engquist from the University of Texas at Austin. Her dissertation was interdisciplinary research between numerical analysis and geophysics, mainly focusing on analyzing and applying optimal transport and inverse problems. In 2019, Yunan was honored as one of thirty-two "Rising Stars in Computational and Data Sciences" by the Oden Institute at the University of Texas at Austin. She was a selected participant of the 7th Heidelberg Laureate Forum (HLF) in Germany. Yunan received First Place in the 19th IMA Leslie Fox Prize in numerical analysts.


4:00 - 5:00 pm, March 31, 2021 (EST), Jiequn Han, Princeton University

Title: On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Video Slides

Abstract: We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization: when there is long-term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the “curse of memory”. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons that may arise in learning temporal relationships using recurrent architectures.


Bio: Jiequn Han is an Instructor at the Department of Mathematics, Princeton University. He obtained his Ph.D. degree in applied mathematics from the Program in Applied and Computational Mathematics (PACM), Princeton University in 2018. His research draws inspiration from various disciplines of science and is devoted to solving high-dimensional problems arising from scientific computing. His current research topics mainly focus on machine learning based multiscale modeling and solving high-dimensional partial differential equations/stochastic control.

4:00 - 5:00 pm, April 07, 2021 (EST), Yifei Lou, UT Dallas

Title: Graph Regularized Models for Blind Hyperspectral Unmixing

Video Slides

Abstract: Remote sensing data from hyperspectral cameras suer from limited spatial resolution, in which a single pixel of a hyperspectral image may contain information from several materials in the eld of view. Blind hyperspectral image unmixing is the process of identifying the spectra of pure materials (i.e., endmembers) and their proportions (i.e., abundances) at each pixel. In this talk, I will present two graph regularized models: graph Laplacian and graph total variation (gTV) for blind hyperspectral unmixing, both of which can be solved eciently by the alternating direction method of multipliers (ADMM). To further alleviate the computational cost, we apply the Nystrom method to approximate a fully-connected graph by a small subset of sampled points. Furthermore, we adopt the Merriman-Bence-Osher (MBO)scheme to solve the gTV-involved subproblem in ADMM by decomposing a grayscale image into a bit-wise form. A variety of numerical experiments on synthetic and real hyperspectral images are conducted, showcasing the potential of the proposed method in terms of identication accuracy and computational eciency.


Bio: Yifei Lou is an Associate Professor in the Mathematical Sciences Department, University of Texas Dallas, where she has been since 2014. She received her Ph.D. in Applied Math from the University of California Los Angeles (UCLA) in 2010. After graduation, she was a postdoctoral fellow at the School of Electrical and Computer Engineering Georgia Institute of Technology, followed by another postdoc training at the Department of Mathematics, University of California Irvine from 2012-2014. Dr. Lou received the National Science Foundation CAREER Award in 2019. Her research interests include compressive sensing and its applications, image analysis (medical imaging, hyperspectral, imaging through turbulence), and (nonconvex) optimization algorithms.


4:00 - 5:00 pm, April 14, 2021 (EST), Braxton Osting, University of Utah

Title: Consistency of Archetypal Analysis

Video Slides

Abstract: Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed k, the method finds a convex polytope with k vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this talk, I'll give a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model. This is joint work with Dong Wang, Yiming Xu, and Dominique Zosso.


Bio: Braxton Osting is an Associate Professor in the Department of Mathematics at the University of Utah. He has broad interests in analytical and computational methods for problems in applied mathematics, especially in partial differential equations, optimization and control, graph theory, and machine learning. Previously he was a postdoc at UCLA and a graduate student at Columbia University.


4:00 - 5:00 pm, April 21, 2021 (EST), Jerome Darbon, Brown University

Title: Overcoming the curse of dimensionality for some Hamilton-Jacobi partial differential equations via neural network architectures

Video Slides

Abstract: We propose new and original mathematical connections between Hamilton-Jacobi (HJ) partial differential equations (PDEs) with initial data and neural network architectures. Specifically, we prove that some classes of neural networks correspond to representation formulas of HJ PDE solutions whose Hamiltonians and initial data are obtained from the parameters or the activation functions of the neural networks. These results do not rely on universal approximation properties of neural networks; rather, our results show that some classes of neural network architectures naturally encode the physics contained in some HJ PDEs. Our results naturally yield efficient neural network-based methods for evaluating solutions of some HJ PDEs in high dimension without using grids or numerical approximations.


This is a joint work with Gabriel P. Langlois and Tingwei Meng.


Bio: Jerome Darbon is an associate professor of Applied Mathematics at Brown University. He is currently on leave from CNRS (at Ecole Normale Superieure Paris-Saclay). His current research interest includes fast algorithms for solving high-dimensional Hamilton-Jacobi partial differential equations and optimal control problems, image processing and optimization algorithms.