Talks of Spring 2022

4:00 - 5:00 pm, Jan 19, 2022 (EST), Daniel P. Robinson, Lehigh University

Title: Reduced Space Optimization for Problems with Group Sparsity

Slides Video

Abstract: I discuss an optimization framework for solving problems with sparsity inducing regularization. Such regularizers include Lasso (L1), group Lasso, and latent group Lasso. The framework computes iterates by optimizing over small dimensional subspaces, thus keeping the cost per iteration relatively low. Theoretical convergence results and numerical tests on various learning problems will be presented.

Bio: Daniel P. Robinson received his Ph.D. from the University of California at San Diego in 2007. He spent the next three years working with Nicholas I. M. Gould and Jorge Nocedal as a Postdoctoral Researcher in the Mathematical Institute at the University of Oxford and the Department of Industrial Engineering and Management Sciences at Northwestern University. In 2011 he joined the Department of Applied Mathematics and Statistics in the Whiting School of Engineering at Johns Hopkins University, and then in 2019 joined the Department of Industrial and Systems Engineering in the PC Rossin College of Engineering & Applied Science at Lehigh University. His primary research area is optimization with specific interest in the design, analysis, and implementation of efficient algorithms for large-scale convex and nonconvex problems, with particular interest in applications related to computer vision and medicine/healthcare. He is a member of the Society of Industrial and Applied Mathematics (SIAM), the SIAM Activity Group (SIAG) in Optimization, the Mathematical Optimization Society (MOS), and the Institute for Operations Research and the Management Sciences (INFORMS), in addition to being the INFORMS Vice-Chair for Nonlinear Optimization (2014-2016). Daniel has served as the cluster chair for the 2016 International Conference on Continuous Optimization (ICCOPT) in Tokyo, Japan, a Program Committee member for the 2017 AAAI Conference on Artificial Intelligence in San Francisco, California, and is co-organizing the 2022 International Conference on Continuous Optimization at Lehigh University. He is currently an Associated Editor for the following journals: Computational Optimization and Applications, Optimization Methods and Software, Optimization Letters, and Journal of Scientific Computing. Finally, Daniel has been awarded the Professor Joel Dean Award for Excellence in Teaching at Johns Hopkins University in 2012 and 2018, and awarded a Best Paper Prize from Optimization Letters for his co-authored 2018 paper titled "Concise Complexity Analyses for Trust-Region Methods".

4:00 - 5:00 pm, Feb 2, 2022 (EST), Robin Walters, Northeastern University

Title: Equivariant Neural Networks for Learning Spatiotemporal Dynamics

Slides Video

Abstract: Applications such as climate science and transportation require learning complex dynamics from large-scale spatiotemporal data. Existing machine learning frameworks are still insufficient to learn spatiotemporal dynamics as they often fail to exploit the underlying physics principles. Representation theory can be used to describe and exploit the symmetry of the dynamical system. We will show how to design neural networks that are equivariant to various symmetries for learning spatiotemporal dynamics. Our methods demonstrate significant improvement in prediction accuracy, generalization, and sample efficiency in forecasting turbulent flows and predicting real-world trajectories. This is joint work with Rose Yu, Rui Wang, and Jinxi Li.


BIO

Robin Walters is a postdoctoral research fellow in the Khoury College of Computer Sciences. He joined Khoury in July 2020 through the Experiential AI program. Formerly, Robin was a Zelevinsky Research Instructor in the Mathematics Department at Northeastern. His research studies the connections between representation theory and differential equations both theoretically and practically using equivariant neural networks.

4:00 - 5:00 pm, Feb 09, 2022 (EST), Erfan Yazdandoost Hamedani, University of Arizona

Title: A Stochastic variance-reduced Primal-dual Method for Convex-concave Saddle point Problems

Slides Video

Abstract: Recent advances in technology have led researchers to study problems with more complicated structure such as distributionally robust optimization, distance metric learning, and kernel matrix learning arising in machine learning. Moreover, there has been a pressing need for more powerful, iterative optimization tools that can handle these complicated structures while employing efficient computations in each iteration. This demand has attracted a vast amount of research focusing on developing primal-dual algorithms due to the versatility of the framework. In this talk, I present a primal-dual algorithm for solving convex-concave saddle-point problems with finite-sum structure and nonbilinear coupling function. When the number of component functions is massive, periodically computing full gradient leads to a progressive variance reduction. I will discuss the convergence rate of the proposed method and illustrate its performance on a distributionally robust optimization problem.

Bio: Dr. Yazdandoost Hamedani is an Assistant Professor at the University of Arizona, Department of Systems and Industrial Engineering. He received the B.S. degree in mathematics and applications from the University of Tehran, Tehran, Iran, in 2015 and the Ph.D. degree in industrial engineering and operation research with minor in statistics from Pennsylvania State University in August 2020. His research interests include distributed optimization, large-scale saddle point problems, and bilevel optimization in machine learning.

4:00 - 5:00 pm, Feb 16, 2022 (EST), Yao Li, University of North Carolina at Chapel Hill

Title: On the Robustness of Deep Neural Networks

Slides Video

Abstract: Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent studies have demonstrated the vulnerability of DNNs against adversarial examples, i.e., examples that are carefully crafted to fool a well-trained DNN while being indistinguishable from the natural images to humans. This makes it unsafe to apply neural networks in security-critical applications.

We present two algorithms, Embedding Regularized Classifier (ER-Classifier) and Bayesian Adversarial Detector (BATer), to train robust neural networks against adversarial examples. Inspired by the observation that adversarial examples often lie outside the natural image data manifold and the intrinsic dimension of image data is much smaller than its pixel space dimension, we propose to embed high-dimensional input images into a low-dimensional space and apply regularization on the embedding space to push the adversarial examples back to the manifold. Another algorithm, BATer, is motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate output distribution of DNN. Experimental results on several benchmark datasets show that our proposed frameworks achieve state-of-the-art performances against strong adversarial attack methods.


Bio: Yao is an assistant professor of statistics at UNC Chapel Hill, and her research is focused on developing efficient and robust machine learning models to solve real-world problems. She has a broad background in statistics, computer science and economics, with specific training in machine learning and deep learning, especially recommendation system, factorial machine and security of deep neural networks.


4:00 - 5:00 pm, Feb 23, 2022 (EST), Dmitriy Drusvyatskiy, University of Washington

Title: Active strict saddles in nonsmooth optimization

Slides Video

Abstract: We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple subgradient and proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

Bio: Dmitriy Drusvyatskiy received his PhD from the Operations Research and Information Engineering department at Cornell University in 2013, followed by a post doctoral appointment in the Combinatorics and Optimization department at Waterloo, 2013-2014. He joined the Mathematics department at University of Washington as an Assistant Professor in 2014, and was promoted to an Associate Professor in 2019. Dmitriy's research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, primarily motivated by applications in data science. Dmitriy has received a number of awards, including the Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) Award, NSF CAREER, INFORMS Optimization Society Young Researcher Prize 2019, and finalist citations for the Tucker Prize 2015 and the Young Researcher Best Paper Prize at ICCOPT 2019. Dmitriy is currently a co-PI of the NSF funded Transdisciplinary Research in Principles of Data Science (TRIPODS) institute at University of Washington.

4:00 - 5:00 pm, March 02, 2022 (EST), Albert S. Berahas, University of Michigan

Title: Algorithms for Deterministically Constrained Stochastic Optimization

Slides Video

Abstract: Stochastic gradient and related methods for solving stochastic optimization problems have been studied extensively in recent years. It has been shown that such algorithms and much of their convergence and complexity guarantees extend in straightforward ways when one considers problems involving simple constraints, such as when one can perform projections onto the feasible region of the problem. However, settings with general nonlinear constraints have received less attention, and many of the approaches that have been proposed for solving such problems resort to using penalty or (augmented) Lagrangian methods, which are often not the most effective strategies. In this work, we propose and analyze stochastic optimization algorithms for deterministically constrained problems based on the sequential quadratic optimization (commonly known as SQP) methodology. We discuss the rationale behind our proposed techniques, convergence in expectation and complexity guarantees for our algorithms, and the results of preliminary numerical experiments that we have performed. This is joint work with Raghu Bollapragada, Frank E. Curtis, Michael O'Neill, Daniel P. Robinson, Jiahao Shi and Baoyu Zhou.


Bio: Albert S. Berahas is an Assistant Professor in the Industrial and Operations Engineering department at the University of Michigan. Before joining the University of Michigan, he was a Postdoctoral Research Fellow in the Industrial and Systems Engineering department at Lehigh University working with Professors Katya Scheinberg, Frank Curtis and Martin Takáč. Prior to that appointment, he was a Postdoctoral Research Fellow in the Industrial Engineering and Management Sciences department at Northwestern University working with ProfessorJorge Nocedal. Berahas completed his PhD studies in the Engineering Sciences and Applied Mathematics (ESAM) department at Northwestern University in 2018, advised by Professor Jorge Nocedal. He received his undergraduate degree in Operations Research and Industrial Engineering (ORIE) from Cornell University in 2009, and in 2012 obtained an MS degree in Applied Mathematics from Northwestern University. Berahas’ research broadly focuses on designing, developing and analyzing algorithms for solving large scale nonlinear optimization problems. Specifically, he is interested in and has explored several sub-fields of nonlinear optimization such as: (i) general nonlinear optimization algorithms, (ii) optimization algorithms for machine learning, (iii) constrained optimization, (iv) stochastic optimization, (v) derivative-free optimization, and (vi) distributed optimization.


4:00 - 5:00 pm, March 16, 2022 (EST), Bao Wang, University of Utah

Title: How Differential Equations Insight Benefit Deep Learning

Slides Video

Abstract: We will present a new class of continuous-depth deep neural networks that were motivated by the ODE

limit of the classical momentum method, named heavy-ball neural ODEs (HBNODEs). HBNODEs enjoy

two properties that imply practical advantages over NODEs: (i) The adjoint state of an HBNODE also

satisfies an HBNODE, accelerating both forward and backward ODE solvers, thus significantly accelerate

learning and improve the utility of the trained models. (ii) The spectrum of HBNODEs is well structured,

enabling effective learning of long-term dependencies from complex sequential data.

Second, we will extend HBNODE to graph learning leveraging diffusion on graphs, resulting in new

algorithms for deep graph learning. The new algorithms are more accurate than existing deep graph learning

algorithms and more scalable to deep architectures, and also suitable for learning at low labeling rate regimes.

Moreover, we will present a fast multipole method-based efficient attention mechanism for modeling graph

nodes interactions. Third, if time permits, we will discuss proximal algorithms for accelerating learning continuous-depth

neural networks


Bio: Bao Wang is an Assistant Professor in the Mathematics Department and affiliated with the Scientific

Computing and Imaging Institute at the University of Utah. He received his Ph.D. from Michigan State

University in 2016 in Computational Mathematics. He started research in deep learning after he joined

UCLA as a postdoc. He has published many refereed journals and conference papers and has expertise in

adversarial defense for deep learning and other areas of data science, including optimization, privacy, and

data security, spatio-temporal event modeling, and prediction. He is a recipient of the Chancellor's Award

for postdoc research of 2020 at the University of California.



4:00 - 5:00 pm, March 30, 2022 (EST), Wei Zhu, University of Massachusetts Amherst

Title: Symmetry-preserving machine learning for computer vision, scientific computing, and distribution learning

Slides Video

Abstract: Symmetry is ubiquitous in machine learning and scientific computing. Robust incorporation of symmetry prior into the learning process has shown to achieve significant model improvement for various learning tasks, especially in the small data regime.

In the first part of the talk, I will explain a principled framework of deformation-robust symmetry-preserving machine learning. The key idea is the spectral regularization of the (group) convolutional filters, which ensures that symmetry is robustly preserved in the model even if the symmetry transformation is “contaminated” by nuisance data deformation.

In the second part of the talk, I will demonstrate how to incorporate additional structural information (such as group symmetry) into generative adversarial networks (GANs) for data-efficient distribution learning. This is accomplished by developing new variational representations for divergences between probability measures with embedded structures. We study, both theoretically and empirically, the effect of structural priors in the two GAN players. The resulting structure-preserving GAN is able to achieve significantly improved sample fidelity and diversity—almost an order of magnitude measured in Fréchet Inception Distance—especially in the limited data regime.


Bio: Wei Zhu is an Assistant Professor at the Department of Mathematics and Statistics, University of Massachusetts Amherst. He received his B.S. in Mathematics from Tsinghua University in 2012, and Ph.D. in Applied Math from UCLA in 2017. Before joining UMass, he worked as a Research Assistant Professor at Duke University from 2017 to 2020. Wei is interested in developing theories and algorithms in statistical learning and applied harmonic analysis to solve problems in machine learning, inverse problems, and scientific computing. His recent research is particularly focused on exploiting and discovering the intrinsic structure and symmetry within the data to improve the interpretability, stability, reliability, and data-efficiency of deep learning models.


4:00 - 5:00 pm, April 6, 2022 (EST), Soledad Villar, Johns Hopkins University

Title: Units-equivariant machine learning

Slides Video

Abstract: We combine ideas from dimensional analysis and from equivariant machine learning to provide an approach for units-equivariant machine learning. Units equivariance is the exact symmetry that follows from the requirement that relationships among measured quantities must obey self-consistent dimensional scalings. Our approach is to construct a dimensionless version of the learning task, using classic results from dimensional analysis, and then perform the learning task in the dimensionless space. This approach can be used to impose units equivariance on almost any contemporary machine-learning methods, including those that are equivariant to rotations and other groups. Units equivariance is expected to be particularly valuable in the contexts of symbolic regression and emulation. We discuss the in-sample and out-of-sample prediction accuracy gains one can obtain if exact units equivariance is imposed; the symmetry is extremely powerful in some contexts. We illustrate these methods with simple numerical examples involving dynamical systems in physics and ecology.


Bio: Soledad Villar is an assistant professor of applied mathematics and statistics at Johns Hopkins University. She received her PhD in mathematics in 2017 from UT Austin, and was later a research fellow at UC Berkeley, and a Moore-Sloan Research Fellow at NYU. Her honors and awards include delivering a commencement speech at UT Austin representing her graduating PhD class in 2017, a Fulbright Fellowship, and she was named a Rising Stars in Computational and Data Sciences in 2019. Her research has been funded by NSF, The Simons Foundation, ONR, and EOARD.

4:00 - 5:00 pm, April 13, 2022 (EST), Yuyuan Ouyang, Clemson University

Title: Universal Conditional Gradient Sliding for Convex Optimization

Slides Video

Abstract: We present a first-order projection-free method, namely, the universal conditional gradient sliding (UCGS) method, for solving approximate solutions to convex differentiable optimization problems. For objective functions with Lipschitz continuous gradients, we show that UCGS is able to terminate with approximate solutions and its complexity for gradient evaluations and linear objective subproblems matches the state-of-the-art upper and lower complexity bound results on first-order projection free method. It also adds more features allowing for practical implementation. In the weakly smooth case when the gradient is Hölder continuous, both the gradient and linear objective complexity results of UCGS improve the current state-of-the-art upper complexity results. Within the class of sliding-type algorithms, to the best of our knowledge, this is the first time a sliding-type algorithm is able to improve not only the gradient complexity but also the overall complexity for computing an approximate solution.


Bio: Yuyuan “Lance” Ouyang is an Associate Professor at the School of Mathematical and Statistical Sciences, Clemson University. His research interest is on algorithm design and complexity analysis for solving large-scale nonlinear optimization problems. His research is supported by NSF and ONR.


10:00 - 11:00 am, April 20, 2022 (EST), Ruoyu Sun, UIUC

Title: Global Loss Landscape of Neural Networks: Knowns and Unknowns

Slides Video

Abstract: The recent success of neural networks suggests that their loss landscape is not too bad, but what concrete results do we know and not know about the landscape? In this talk, we present a few recent results on the landscape. First, non-linear neural nets can have sub-optimal local minima under mild assumptions (for arbitrary width, for generic input data and for most activation functions). Second, wide networks do not have sub-optimal ``basin'' for any continuous activation, while narrow networks can have sub-optimal basin. These results show that width alone can eliminate bad ``basin'', but cannot eliminate bad local minima. We will present a simple 2D geometrical object that is a basic component of neural net landscape, which can visually explain the above two results. Third, we show that for ReQU and ReLU networks, adding a proper regularizer can eliminate sub-optimal local minima and decrease paths to infinity. In other words, large width and regularizer together can eliminate sub-optimal local minima. Finally, we briefly discuss a few future directions on the optimization of neural networks.

Bio: Ruoyu Sun is an assistant professor in the Department of Industrial and Enterprise Systems Engineering (ISE) and affiliated with Coordinated Science Lab (CSL) and Department of Electrical and Computer Engineering (ECE), the University of Illinois at Urbana-Champaign (UIUC). Before joining UIUC, he was a visiting research scientist at Facebook AI Research (FAIR). His current research interests lie in optimization and deep learning, including deep learning theory, generative adversarial networks, adaptive gradient methods, adversarial robustness, and meta-learning. He has won second place in INFORMS George Nicholson student paper competition, and honorable mention of INFORMS optimization society student paper competition. He has been serving as an area chair of machine learning conferences such as NeurIPS, ICLR, ICML, and AISTATS. He was a postdoctoral researcher at Stanford University, obtained Ph.D. in Electrical Engineering from the University of Minnesota, and obtained a B.S. in mathematics from Peking University.