Talks of Fall 2020

4:00 - 5:00 pm, Sep 9, 2020 (EST), John Wright, Columbia University,

Title: Geometry and Symmetry in (some!) Nonconvex Optimization Problems

Abstract: Nonconvex optimization plays an important role in wide range of areas of science and engineering — from learning feature representations for visual classification, to reconstructing images in biology, medicine and astronomy, to disentangling spikes from multiple neurons. The worst case theory for nonconvex optimization is dismal: in general, even guaranteeing a local minimum is NP hard. However, in these and other applications, very simple iterative methods often perform surprisingly well.

In this talk, I will discuss a family of nonconvex optimization problems that can be solved to global optimality using simple iterative methods, which succeed independent of initialization. This family includes certain model problems in feature learning, imaging and scientific data analysis. These problems possess a characteristic structure, in which (i) all local minima are global, and (ii) the optimization landscape does not have any flat saddle points. I will describe how these features arise naturally as a consequence of problem symmetries, and how they lead to new types of performance guarantees for efficient methods. I will motivate these problems from microscopy, astronomy and computer vision, and show applications of our results in these domains.

Includes joint work with Yuqian Zhang, Qing Qu, Ju Sun, Henry Kuo, Yenson Lau, Dar Gilboa, Abhay Pauspathy

Bio: John Wright is an associate professor in Electrical Engineering at Columbia University. He is also affiliated with the Department of Applied Physics and Applied Mathematics and Columbia’s Data Science Institute. He received his PhD in Electrical Engineering from the University of Illinois at Urbana Champaign in 2009. Before joining Columbia he was with Microsoft Research Asia from 2009-2011. His research interests include sparse and low-dimensional models for high-dimensional data, optimization (convex and otherwise), and applications in imaging and vision. His work has received a number of awards and honors, including the 2012 COLT Best Paper Award and the 2015 PAMI TC Young Researcher Award.

4:00 - 5:00 pm, Sep 16, 2020 (EST), Wenjing Liao, Georgia Institute of Technology

Title: Regression of functions on low-dimensional manifolds by deep neural networks

Video Slides

Abstract: Many data in real-world applications lie in a high-dimensional space but are concentrated on or near a low-dimensional manifold. Our goal is to estimate functions on the manifold from finite samples of data. This talk focuses on an efficient approximation theory of deep ReLU networks for functions supported on low-dimensional manifolds. We construct a ReLU network for such function approximation where the size of the network grows exponentially with respect to the intrinsic dimension of the manifold. When the function is estimated from finite samples, we proved that the mean squared error for the function approximation converges as the training samples increases with a rate depending on the intrinsic dimension of the manifold instead of the ambient dimension of the space. These results demonstrate that deep neural networks are adaptive to low-dimensional geometric structures of data. This is a joint work with Minshuo Chen, Haoming Jiang, Tuo Zhao (Georgia Institute of Technology).

Bio: Dr. Wenjing Liao is an assistant professor in the School of Mathematics at Georgia Tech. She obtained her Ph.D. in mathematics at University of California, Davis in 2013. She was a visiting assistant professor at Duke University from 2013 to 2016, as well as a postdoctoral fellow at Statistical and Applied Mathematical Sciences Institute from 2013 to 2015. She worked at Johns Hopkins University as an assistant research scientist from 2016 to 2017. She works on theory and algorithms in the intersection of applied math, machine learning and signal processing. Her current research interests include multiscale methods for dimension reduction, statistical learning theory, and deep learning.

4:00 - 5:00 pm, Sep 9, 2020 (EST), John Wright, Columbia University,

Title: Geometry and Symmetry in (some!) Nonconvex Optimization Problems

Video Slides

Abstract: Nonconvex optimization plays an important role in wide range of areas of science and engineering — from learning feature representations for visual classification, to reconstructing images in biology, medicine and astronomy, to disentangling spikes from multiple neurons. The worst case theory for nonconvex optimization is dismal: in general, even guaranteeing a local minimum is NP hard. However, in these and other applications, very simple iterative methods often perform surprisingly well.

In this talk, I will discuss a family of nonconvex optimization problems that can be solved to global optimality using simple iterative methods, which succeed independent of initialization. This family includes certain model problems in feature learning, imaging and scientific data analysis. These problems possess a characteristic structure, in which (i) all local minima are global, and (ii) the optimization landscape does not have any flat saddle points. I will describe how these features arise naturally as a consequence of problem symmetries, and how they lead to new types of performance guarantees for efficient methods. I will motivate these problems from microscopy, astronomy and computer vision, and show applications of our results in these domains.

Includes joint work with Yuqian Zhang, Qing Qu, Ju Sun, Henry Kuo, Yenson Lau, Dar Gilboa, Abhay Pauspathy

Bio: John Wright is an associate professor in Electrical Engineering at Columbia University. He is also affiliated with the Department of Applied Physics and Applied Mathematics and Columbia’s Data Science Institute. He received his PhD in Electrical Engineering from the University of Illinois at Urbana Champaign in 2009. Before joining Columbia he was with Microsoft Research Asia from 2009-2011. His research interests include sparse and low-dimensional models for high-dimensional data, optimization (convex and otherwise), and applications in imaging and vision. His work has received a number of awards and honors, including the 2012 COLT Best Paper Award and the 2015 PAMI TC Young Researcher Award.

4:00 - 5:00 pm, Sep 23, 2020 (EST), Jeff Calder, University of Minnesota

Title: Poisson learning: Graph-based semi-supervised learning at very low label rates

Video Slides

Abstract: Graph-based learning is a field within machine learning that uses similarities between datapoints to create efficient representations of high-dimensional data for tasks like semi-supervised classification, clustering and dimension reduction. For semi-supervised learning, the widely used Laplacian regularizer has recently been shown to be ill-posed at very low label rates, leading to poor classification similar to random guessing. In this talk, we will give a careful analysis of the ill-posedness of Laplacian regularization via random walks on graphs, and this will lead to a new algorithm for semi-supervised learning that we call Poisson learning. Poisson learning replaces the assignment of label values at training points with the placement of sources and sinks, and solves the resulting Poisson equation on the graph. The outcomes are provably more stable and informative than those of Laplacian learning. Poisson learning is efficient and simple to implement, and we will present numerical results on MNIST, FashionMNIST and Cifar-10 showing that the method is superior to other recent approaches to semi-supervised learning at low label rates.

This talk is joint work with Brendan Cook (UMN), Dejan Slepcev (CMU) and Matthew Thorpe (University Manchester).

Bio: Jeff Calder is an assistant professor of mathematics at the University of Minnesota. He completed his PhD in 2014 at the University of Michigan advised by Selim Esedoglu (Math) and Alfred Hero (EECS), and was a Morrey assistant professor at UC Berkeley from 2014-2016. Calder was awarded an Alfred P. Sloan Research Fellowship in 2020. His research interests are focused on the intersection of partial differential equations, machine learning, and applied probability.

4:00 - 5:00 pm, Oct. 07, 2020 (EST), Ermin Wei, Northwestern University

Title: Robust and Flexible Distributed Optimization Algorithms for Networked Systems

Video Slides

Abstract: In this talk, we present a framework of distributed optimization algorithms for the problem of minimizing a sum of convex objective functions, which are locally available to agents in a network. This setup is motivated by the federated machine learning applications. Distributed optimization algorithms make it possible for the agents to cooperatively solve the problem through local computations and communications with neighbors. The algorithms we propose can be proven to converge linearly to the optimal solution, can tolerate communication and computation noises and can adapt to various hardware environments with different computation and communication capabilities.

Bio: Ermin Wei is currently an Assistant Professor at the Electrical and Computer Engineering Department and Industrial Engineering and Management Sciences Department of Northwestern University. She completed her PhD studies in Electrical Engineering and Computer Science at MIT in 2014, advised by Professor Asu Ozdaglar, where she also obtained her M.S. She received her undergraduate triple degree in Computer Engineering, Finance and Mathematics with a minor in German, from University of Maryland, College Park. She has received many awards, including the Graduate Women of Excellence Award, second place prize in Ernst A. Guillemen Thesis Award. Her team won the 2^nd place in the GO-competition 2019, an electricity grid optimization competition organized by Department of Energy. Wei's research interests include distributed optimization methods, convex optimization and analysis, smart grid, communication systems and energy networks and market economic analysis.

4:00 - 5:00 pm, Oct 14, 2020 (EST), Martin Takac, Lehigh University

Title: New Sampled Quasi-Newton Methods and a Symmetric Blockwise Truncated Optimization Algorithm for Machine Learning Problems

Video

Abstract: In this talk, we first present two sampled quasi-Newton methods for deep learning: sampled LBFGS and LSR1. Contrary to the classical variants of these methods that sequentially build Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information. We then present a SONIA algorithm that bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement. To this end, partial curvature information is incorporated to help with ill-conditioning, while simultaneously allowing the algorithm to scale to the large problem dimensions often encountered in machine learning applications. We provide convergence guarantees for the methods, discuss possible large-scale parallel/distributed implementations, and illustrate their performance on several neural network training tasks.

Bio: Prof Takac received his B.S. (2008) and M.S. (2010) degrees in Mathematics from Comenius University, Slovakia, and Ph.D. (2014) degree in Mathematics from The University of Edinburgh, United Kingdom. He received several awards during this period, including the Best Ph.D. Dissertation Award by the OR Society (2014), Leslie Fox Prize (2nd Prize; 2013) by the Institute for Mathematics and its Applications, and INFORMS Computing Society Best Student Paper Award (runner up; 2012). Since 2014, he is a Tenure Track Assistant Professor in the Department of Industrial and Systems Engineering at Lehigh University, USA. His current research interests include the design, analysis and application of algorithms for machine learning, optimization, high-performance computing, operations research and energy systems. He is a core faculty at the OptML Group and is affiliated faculty of the Cognitive Science Program

4:00 - 5:00 pm, Oct 21, 2020 (EST), Yuehaw Khoo, University of Chicago

Title: Some problems in point-set registration

Video Slides

Abstract: In this talk, we discuss variants of the rigid registration problem, i.e aligning objects via rigid transformation. In the simplest scenario of point-set registration where the correspondence between points are known, we investigate the robustness of registration to outliers. We also study a convex programming formulation of point-set registration with exact recovery, in the situation where both the correspondence and alignment are unknown. This talk is based on joint works with Ankur Kapoor, Cindy Orozco, and Lexing Ying.

Bio: Yuehaw Khoo is an assistant professor in the statistics department of University of Chicago. Prior to this, he was a post-doc in Stanford and graduate student in Princeton. He is interested in scientific computing problems in protein structure determination and many-body physics.

4:00 - 5:00 pm, Oct 28, 2020 (EST), Xiuyuan Cheng, Duke University

Title: Comparing Manifold Data Distributions by Diffusion Kernels and Neural Networks

Video Slides

Abstract: Kernel-based Maximum Mean Discrepancy (MMD) is widely used to compare multivariate distributions, and at the same time, the recent success of generative adversarial networks and variational learning suggests that training a classification network may work well in addressing the classical two-sample testing problem. An important consideration here is the geometric properties of data distributions. In this talk, we focus on the situation where data samples lie on or near to low-dimensional manifolds embedded in a possibly high-dimensional space. For kernel MMD, we introduce a new test statistic using anisotropic kernels and a set of reference data points, the ability of which to distinguish data distributions is theoretically characterized by the spectral decomposition of the kernel. In the second half of the talk, we introduce a neural network test using the logit function of a trained classifier, where, with Lipschitz regularization of the network functions, both the approximation and estimation error is proved to only scale with the intrinsic dimensionality of the data. We discuss applications to biomedical datasets, including flow cytometry data and single-cell RNA sequencing data. Joint work with Alexander Cloninger (UCSD) and Kluger's lab (Yale School of Medicine).

Bio: Xiuyuan Cheng is an assistant professor of mathematics at Duke University, with a secondary appointment in the Department of Electrical and Computer Engineering. Dr. Cheng is interested in theoretical and computational problems in high-dimensional data analysis and machine learning, particularly on spectral methods, kernel matrices, and deep neural networks from a signal processing point of view. Dr. Cheng's work is supported by NSF, NIH, and the Alfred P. Sloan Foundation.

4:00 - 5:00 pm, Nov. 04, 2020 (EST), Tianbao Yang, University of Iowa

Title: Deep AUC Maximization and Applications in Medical Image Classification

Video Slides

Abstract: In this talk, I will present our recent research on a new learning paradigm of deep learning by AUC maximization. I will present a new surrogate loss for AUC and non-convex min-max optimization algorithms for solving deep AUC maximization problem. I will also talk about our results on Stanford CheXpert competition, on which our method is ranked at the 1st place as of today.

Bio: Tianbao Yang is an associate Professor of Computer Science at the University of Iowa. His research interests center round optimization, machine learning and AI. He received the best student paper award at COLT in 2012, NSF Career Award in 2019, and was named Dean’s Excellence in Research Scholar. In August 2020, his group achieved the 1st place at Stanford CheXpert competition. He has published more than 100 papers and served as associate editor of Neurocomputing, and senior PC of AAAI and IJCAI.

4:00 - 5:00 pm, Nov. 18, 2020 (EST), Xiaojing Ye, Georgia State University

Title: Inverse Optimal Transport: Learning Cost Functions for Optimal Transport

Video Slides

Abstract: We consider the inverse problem of optimal transport (OT) to recover the ground cost function from observed transport plan or its samples. The inverse OT problem has been cast as a bi-level optimization problem in the literature, where each iteration requires solving a forward OT problem and causes substantial computational cost overall. In this work, we derive an equivalent but much simpler unconstrained and convex optimization formulation of the inverse OT problem, which can be further augmented by customizable regularization and solved easily. We provide a complete characterization of the new optimization problem and its solution space. To validate the effectiveness of this framework, we develop two numerical algorithms, one is a fast matrix scaling method based on the Sinkhorn-Knopp algorithm for the discrete case, and the other one is a deep learning based method that trains the cost function as a deep neural network using the samples of transport plan for the continuous case. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed methods.

Bio: Dr. Xiaojing Ye is an associate professor in the Department of Mathematics and Statistics at Georgia State University. Dr. Ye obtained his doctoral degree in mathematics from University of Florida in 2011. His main research includes analysis and numerical methods for stochastic point processes on networks, modeling and computations of optimal transport and applications, and theory and computations in machine learning with applications in deep learning and image processing.

4:00 - 5:00 pm, Dec 02, 2020 (EST), Na (Lina) Li, Harvard University

Title: Harnessing structure and algorithmic properties for scalable learning methods over networks

Slides

Abstract: Recent radical evolution in distributed sensing, computation, communication, and actuation has fostered the emergence of cyber-physical networked systems, across a broad spectrum of engineering and societal fields. One major challenge is how to shape the network collective behavior through the design of scalable decision-making algorithms for individual components. In this talk, I will present two lines of work to demonstrate how to exploit special properties for scalable solutions. First, I will present our recent scalable multiagent reinforcement learning algorithms which only use local sensing and communication yet learn nearly-optimal localized policies for the global network. The key of the algorithm design is to exploit the network connectivity property. Then I will present our limited communication distributed learning methods that compress the communicated information of a linear-convergent algorithm to a few bits while still preserving the linear convergence. The key of the compression is to exploit the linear convergence and smoothness.

Bio: Na Li is a Gordon McKay professor in Electrical Engineering and Applied Mathematics at Harvard University. She received her Bachelor degree in Mathematics from Zhejiang University in 2007 and Ph.D. degree in Control and Dynamical systems from California Institute of Technology in 2013. She was a postdoctoral associate at Massachusetts Institute of Technology 2013-2014. Her research lies in control, learning, and optimization of networked systems, including theory development, algorithm design, and applications to real-world cyber-physical societal system. She received NSF career award (2016), AFSOR Young Investigator Award (2017), ONR Young Investigator Award(2019), Donald P. Eckman Award (2019), McDonald Mentoring Award (2020), along with some other awards.

4:00 - 5:00 pm, Dec 09, 2020 (EST), Wuchen Li, University of South Carolina

Title: Transport information flows for Bayesian sampling problems

Video Slides

Abstract: In AI and inverse problems, the Markov chain Monte Carlo (MCMC) method is a classical model-free method for sampling target distributions. A fact is that the optimal transport first-order method (gradient flow) forms the MCMC scheme, known as Langevin dynamics. A natural question arises: Can we propose accelerated or high order optimization techniques for MCMC methods? We positively answer this question by applying optimization methods from optimal transport and information geometry, known as transport information geometry. E.g., we introduce a theoretical framework for Newton's flows in probability space w.r.t. the optimal transport metric. Several numerical examples are given to demonstrate the effectiveness of the proposed optimization-based sampling methods.

Bio: Wuchen Li received his BSc from Shandong university in 2009, and then obtained a Ph.D. degree from Georgia tech in 2016. He has been at UCLA as a CAM assistant adjunct professor from 2016 to 2020. He is now an assistant professor at the University of South Carolina. His research focuses on transport information geometry in data science, scientific computing, and mean-field game modeling.

Organizers: Rongjie Lai , Rensselaer Polytechnic Institute

John Mitchell, Rensselaer Polytechnic Institute

Yangyang Xu, Rensselaer Polytechnic Institute