Talks of Fall 2021

4:00 - 5:00 pm, Sep 8, 2021 (EST), Shiqian Ma, UC Davis

Title: Riemannian Optimization for Projection Robust Optimal Transport

Slides Video

Abstract: The optimal transport problem is known to suffer the curse of dimensionality. A recently proposed approach to mitigate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then compute the optimal transport between the projected data. However, this approach requires solving a max-min problem over the Stiefel manifold, which is very challenging in practice. In this talk, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem. We analyze the complexity of arithmetic operations for RBCD to obtain an $\epsilon$-stationary point, and show that it significantly improves the corresponding complexity of existing methods. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large. If time permits, we will also talk about the projection robust Wasserstein barycenter problem.

Bio: Shiqian Ma is currently a tenured associate professor in the Department of Mathematics at University of California, Davis. He received his BS in Mathematics from Peking University in 2003, MS in Computational Mathematics from the Chinese Academy of Sciences in 2006, and PhD in Industrial Engineering and Operations Research from Columbia University in 2011. Shiqian was an NSF postdoctoral fellow in the Institute for Mathematics and its Applications at the University of Minnesota during 2011-2012 and an assistant professor in the Department of Systems Engineering and Engineering Management at the Chinese University of Hong Kong during 2012-2017. His current research interests include theory and algorithms for large-scale optimization, and their various applications in machine learning, signal processing and statistics. Shiqian received the INFORMS Optimization Society Best Student Paper Prize in 2010, and an honorable mention in the INFORMS George Nicholson Student Paper Competition in 2011. He was one of the finalists for the 2011 IBM Herman Goldstine fellowship. He received the Journal of the Operations Research Society of China Excellent Paper Award in 2016. Shiqian served as the area chair for ICML 2021 and currently serves on the editorial board of Journal of Scientific Computing.



4:00 - 5:00 pm, Sep 15, 2021 (EST), Richard Tsai, UT Austin

Title: Numerical wave propagation aided by deep learning

Slides Video

Abstract: We propose a deep learning approach for wave propagation in media with multiscale wave speed, using a second-order linear wave equation model. We use neural networks to enhance the accuracy of a given inaccurate coarse solver, which under-resolves a class of multiscale wave media and wave fields of interest. Our approach involves generating training data by the given computationally efficient coarse solver and another sufficiently accurate solver, applied to a class of wave media (described by their wave speed profiles) and initial wave fields. We find that the trained neural networks can approximate the nonlinear dependence of the propagation on the wave speed as long as the causality is appropriately sampled in training data. We combine the neural-network-enhanced coarse solver with the parareal algorithm and demonstrate that the coupled approach improves the stability of parareal algorithms for wave propagation and improves the accuracy of the enhanced coarse solvers.

Bio: BS from National Taiwan University,

PhD in Mathematics from UCLA in 2002.


Veblen Instructor, 2002-2004, Princeton University and the IAS

Currently Professor in Mathematics Department and Oden Institute for Computational Engineering and Sciences.

Previously also Professor in Numerical Analysis, Royal Institute of Technology KTH, Sweden.


Recognition: Sloan Fellows, Simons Fellows, NCTS (National Center for Theoretical Sciences) Scholar, Moncrief Grand Challenge Award, Peter O’Donnell Distinguished Research Award


Research interests:


(Recently) machine learning theory and algorithms

multiscale modeling and computation for highly oscillatory systems

numerical methods for differential equations (mainly hyperbolic problems) and interface dynamics

Algorithms for games and robotic navigation

4:00 - 5:00 pm, Sep 22, 2021 (EST), Haizhao Yang, Purdue University

Title: Two-Layer Neural Networks for Partial Differential Equations: Optimization and Generalization Theory

Slides Video

Abstract: The problem of solving partial differential equations (PDEs) can be formulated into a least squares minimization problem, where neural networks are used to parametrize PDE solutions. A global minimizer corresponds to a neural network that solves the given PDE. In this talk, we show that the gradient descent method can identify a global minimizer of the least-squares optimization for solving second-order linear PDEs with two-layer neural networks under the assumption of over-parametrization. We also analyze the generalization error of the least-squares optimization for second-order linear PDEs and two-layer neural networks, when the right-handside function of the PDE is in a Barron-type space and the least-squares optimization is regularized with a Barron-type norm, without the over-parametrization assumption.


Bio: Haizhao Yang has been an assistant professor of mathematics and data science at Purdue University since 2019. Prior to that, he was an assistant professor at the National University of Singapore from 2017 to 2019, and a visiting assistant professor at Duke University from 2015 to 2017. He received a B.Sc. at Shanghai Jiao Tong University in 2010, an M.Sc. at the University of Texas at Austin in 2012, and a Ph.D. at Stanford University in 2015. His research focuses on machine learning, data science, applied and computational mathematics.


4:00 - 5:00 pm, Sep 29, 2021 (EST), Felix Ye, SUNY Albany

Title: Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifolds.

Slides Video

Abstract: We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics, and high-dimensional, large fast modes. Given only access to a black box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold, a process of the eective stochastic dynamics on it, which has averaged out the fast modes, and a simulator thereof. This simulator is ecient in that it exploits of the low dimension of the invariant manifold, and takes time steps of size dependent on the regularity of the eective process, and therefore typically much larger than that of the original simulator, which had to resolve the fast modes. The algorithm and the estimation can be performed on-the-fly, leading to efficient exploration of the effective state space, without losing consistency with the underlying dynamics. This construction enables fast and ecient simulation of paths of the eective dynamics, together with estimation of crucial features and observables of such dynamics, including the stationary distribution, identication of metastable states, and residence times and transition rates between them.


Bio: Felix Ye is currently an assistant professor in the department of mathematics & statistics at University at Albany. Previously, I was a postdoctoral fellow at Johns Hopkins University. I received my PhD in applied mathematics at University of Washington. My research interests are in the intersection of dynamical systems and machine learning.


4:00 - 5:00 pm, Oct 6, 2021 (EST), Jonathan Siegel, Pennsylvania State University

Title: The Approximation Theory of Shallow Neural Networks

Slides Video

Abstract: A shallow neural network is a linear combination of ridge functions whose profile is determined by a fixed activation function. We will introduce spaces of functions which can be efficiently approximated by shallow neural networks for a wide variety of activation functions and analyze their properties. Specifically, we will compute their metric entropy and n-widths, which are fundamental quantities in approximation theory that control the limits of linear and non-linear approximation and statistical estimation for a given class of functions. Consequences of these results include: the optimal approximation rates which can be attained for shallow neural networks, that shallow neural networks dramatically outperform linear methods of approximation, and even that shallow neural networks are optimal among all non-linear methods on these spaces, if stability or continuity of the non-linear method is required. Finally, we discuss algorithmic aspects of approximation by shallow networks. Specifically, we analyze a class of greedy algorithms and show that they can attain the theoretically optimal approximation rates.


Bio: Jonathan Siegel is currently an assistant research professor at Penn State working with Prof. Jinchao Xu. He attended graduate school at UCLA and received a Ph.D. under the guidance of Prof. Russel Caflisch in 2018. In his dissertation, he studied optimization on manifolds and its applications to electronic structure calculations, for which he won the Pacific Journal of Mathematics Dissertation Prize. Since then, he has been a postdoc at Penn State working on the optimization theory and approximation properties of neural networks.

4:00 - 5:00 pm, Oct. 13, 2021 (EST), Meisam Razaviyayn, University of Southern California

Title: Learning with nonconvex min-max optimization

Meeting Link: https://rensselaer.webex.com/meet/xuy21

Abstract: Recent applications that arise in machine learning have surged significant interest in solving min-max optimization problems. This problem has been extensively studied in the convex-concave regime for which a global optimal solution solution can be computed efficiently. However, in the nonconvex (smooth) regime, most problems cannot be solved to any reasonable notion of stationarity. In this work, we present different classes of smooth nonconvex min-max problems that can be solved efficiently up to first-order stationarity of its Moreau envelope. In particular, we propose efficient algorithms for finding (first-order) stationary solutions to nonconvex min-max problems classes when the inner maximization problem is concave or when the diameter of the constraint set for the inner maximization problem is "small". Our results are the first of its kind that find stationary solutions to nonconvex-nonconcave min-max problems without assuming any restriction on the objective function (other than standard smoothness assumptions). We also discuss the validity of our assumptions in various applications and evaluate the performance of our algorithm on different applications including training adversarially robust neural networks, fair machine learning, data imputation, and training generative adversarial networks.


Bio: Meisam Razaviyayn is an assistant professor of Industrial and Systems Engineering, Electrical Engineering, and Computer Science at the University of Southern California. His research interests include the design and analysis of optimization algorithms for modern problems arising in machine learning applications. His contributions to the field of optimization were recognized through awards such as Signal Processing Society Young Author Best PaperAward, ICCM Best Paper Award in Mathematics, IEEE Data Science Workshop Best Paper Award, and the 3M NTFA award. He was a twice finalist of the Best Paper Prize for Young Researcher in Continuous Optimization for two different papers.


4:00 - 5:00 pm, Oct. 20, 2021 (EST), Ming Yan, Michigan State University

Title: Data Compression in Distributed Learning

Slides Video

Abstract: Large-scale machine learning models are trained by parallel (stochastic) gradient descent algorithms on distributed systems. The communications for gradient aggregation and model synchronization become the major obstacles for efficient learning as the number of nodes and the model's dimension scale up. In this talk, I will introduce several ways to compress the transferred data and reduce the overall communication such that the obstacles can be immensely mitigated. More specifically, I will introduce methods to reduce or eliminate the compression error without additional communication.


Bio: Ming Yan is an associate professor in the Department of Computational Mathematics, Science and Engineering (CMSE) and the Department of Mathematics at Michigan State University. His research interests lie in computational optimization and its applications in image processing, machine learning, and other data-science problems. He received his B.S. and M.S in mathematics from University of Science and Technology of China in 2005 and 2008, respectively, and then Ph.D. in mathematics from University of California, Los Angeles in 2012. After completing his PhD, Ming Yan was a Postdoctoral Fellow in the Department of Computational and Applied Mathematics at Rice University from July 2012 to June 2013, and then moved to University of California, Los Angeles as a Postdoctoral Scholar and an Assistant Adjunct Professor from July 2013 to June 2015. He received a Facebook faculty award in 2020.


4:00 - 5:00 pm, Oct. 27, 2021 (EST), Negin Golrezaei, MIT

Title: Optimal Learning for Structured Bandits

Slides Video

Abstract: We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer only minimal regret. As recently pointed out, neither algorithms are, however, capable of exploiting structural information that is commonly available in practice. We propose a novel learning algorithm that we call DUSA whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of the regret lower bound at the empirical reward distribution and follows its suggested play. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that has asymptotic minimal regret.


Bio: Negin Golrezaei is the KDD Career Development Professor in Communications and Technology and an Assistant Professor of Operations Management at the MIT Sloan School of Management. Her current research interests are in the area of machine learning, statistical learning theory, mechanism design, and optimization algorithms with applications to revenue management, pricing, and online markets. Before joining MIT, Negin spent a year as a postdoctoral fellow at Google Research in New York where she worked with the Market Algorithm team to develop, design, and test new mechanisms and algorithms for online marketplaces. She received her PhD (2017) in operations research from USC. Negin is currently serving as an associate editor for Operations Research Letters. She is the recipient of several awards including the 2021 Young Investigator Award at ONR; the 2018 Google Faculty Research Award; 2017 George B. Dantzig Dissertation Award; the 2017 INFORMS Revenue Management and Pricing Section Dissertation Prize; and the 2016 University of Southern California (USC) University Outstanding Teaching Award.


4:00 - 5:00 pm, Nov. 03, 2021 (EST), Ying Sun, Pennsylvania State University

Title: Decentralized Optimization for Statistical Learning over Networks

Slides Video

Abstract: Advances in computation, communication, and data storage techniques in recent decades significantly reduced the cost of data acquisition, leading to an explosion of data generated across different interconnected platforms. Apart from the computational difficulties that arise from nonconvex formulations, the sheer volume and spatial disparity of data also pose challenges to traditional learning procedures, which typically require centralized training sets. Reaping the dividend offered by the data deluge necessitates the development of collaborative learning methods capable of making inferences from data over the network.

This talk discusses some recent developments in decentralized learning algorithms and their computational and statistical guarantees. A quick introduction to optimization-based decentralized learning problems will be given in the talk, followed by a novel algorithmic framework, SONATA, and its convergence properties for learning problems in different classes. Then we focus on the statistical properties of the algorithm. While statistical-computation tradeoffs have been largely explored in the centralized setting, our understanding over meshed networks is limited. Decentralized schemes, designed from a pure optimization perspective, can be either inefficient or even non-convergent when applied to solving statistical learning problems. Through two vignettes from low- and high-dimensional statistical inference, we will go over some designs and new analyses aiming at bringing statistical thinking in decentralized optimization.

Bio: Ying Sun is an assistant professor in the Department of Electrical Engineering at The Pennsylvania State University. She received the B.E. degree in electronic information from the Huazhong University of Science and Technology, Wuhan, China, in 2011, and the Ph.D. degree in electronic and computer engineering from The Hong Kong University of Science and Technology in 2016. She was a postdoc researcher with the School of Industrial Engineering, Purdue University from 2016 to 2020. Her research interests include statistical signal processing, optimization algorithms and machine learning. She is a co-recipient of a student best paper at IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 2017, and a recipient of the 2020 IEEE Signal Processing Society Young Author Best Paper Award.

4:00 - 5:00 pm, Nov 10, 2021 (EST), Guido Montufar, University of California, Los Angeles

Title: Geometry of Linear Convolutional Networks

Slides Video

Abstract: We study the family of functions that are represented by a linear convolutional neural network (LCN). These functions form a semi-algebraic subset of the set of linear maps from input space to output space. In contrast, the families of functions represented by fully-connected linear networks form algebraic sets. We observe that the functions represented by LCNs can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the resulting function space. We further study the optimization of an objective function over an LCN, analyzing critical points in function space and in parameter space, and describing dynamical invariants for gradient descent. Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers, or filters that can be decomposed as repeated filters. We also conduct numerical and symbolic experiments that illustrate our results and present an in-depth analysis of the landscape for small architectures.


This talk is based on joint work with Kathlén Kohn, Thomas Merkh, Matthew Trager: https://arxiv.org/abs/2108.01538.

4:00 - 5:00 pm, Nov 17, 2021 (EST), James Murphy, Tufts University

Title: Geometric and Statistical Approaches to Shallow and Deep Clustering

Slides Video

Abstract: We propose approaches to unsupervised clustering based on data-dependent distances and dictionary learning. By considering metrics derived from data-driven graphs, robustness to noise and ambient dimensionality is achieved. Connections to geometric analysis, stochastic processes, and deep learning are emphasized. The proposed algorithms enjoy theoretical performance guarantees on flexible data models and in some cases guarantees ensuring quasilinear scaling in the number of data points. Applications to image processing demonstrate state-of-the-art empirical performance. Extensions to active learning, generative modeling, and computational geometry will be discussed.


Short bio: James M. Murphy is an assistant professor of mathematics at Tufts University. He completed his undergraduate degree at UChicago in 2011 and his PhD at the University of Maryland, College Park in 2015, both in mathematics. He had post-doctoral stints at Duke and Johns Hopkins before joining Tufts in 2018. His research is at the intersection of harmonic analysis, data science, and applications to image and signal processing. His group is currently focused on unsupervised machine learning, analysis on graphs, optimal transport, and applications to hyperspectral imaging and molecular dynamics.

4:00 - 5:00 pm, Dec 1, 2021 (EST), Sui Tang, UC Santa Barbara

Title: Data-driven discovery of interacting particle system using Gaussian processes

Slides Video

Abstract: Interacting particle or agent systems are widely used to model complicated collective motions of animal groups in biological science, such as flocking of birds, milling of fish, and swarming of prey. A fundamental goal is to understand the link between individual interaction rules and collective behaviors. We consider second-order interacting agent systems and study an inverse problem: given observed data, can we discover the interaction rule and even the governing equation? For the interactions that only depend on pairwise distance, we propose a learning approach based on Gaussian processes that can simultaneously infer the interaction kernel without assuming a parametric form and learn other unknown parameters in the governing equations. We establish an operator-based approach to analyze the recoverability via the coercivity of the associated operator and provide the analysis of the finite sample posterior error. The numerical results on prototype systems, including Cuker-Smale dynamics and fish milling dynamics, show that our approach produced faithful estimators from scarce and noisy trajectory data and made accurate predictions of collective behaviors. This talk is based on the joint work with Jinchao Feng, and Yunxiang Ren.


Short Bio: Sui Tang is an Assistant Professor at the Department of Mathematics, University of California Santa Barbara. She received her PhD in Mathematics from Vanderbilt University in 2016 and worked as Assistant Research Professor at Johns Hopkins University from 2016 to 2020. Sui is interested in employing and developing techniques in Statistical Learning, Harmonic Analysis, Approximation Theory, and Probability to solve problems in the interface of machine learning, inverse problems, signal processing, and dynamical systems. Part of her work is motivated by the need to exploit dynamical data sets in dynamical systems that exhibit collective behavior to perform inference with provable performance and build generalizable and interpretable predictive models.

4:00 - 5:00 pm, Dec 08, 2021 (EST), Usman Khan, Tufts University

Title: Distributed stochastic non-convex optimization: Optimal regimes and tradeoffs

Slides Video