Past Talks

Wednesday, September 18, 2023

Title. Machine Learning from Weak, Noisy, and Biased Supervision

Speaker: Masashi Sugiyama

University of Tokyo and RIKEN

Abstract. In statistical inference and machine learning, we face a variety of uncertainties such as training data with insufficient information, label noise, and bias. In this talk, I will give an overview of our research on reliable machine learning, including weakly supervised classification (positive unlabeled classification, positive confidence classification, complementary label classification, etc.), noisy label classification (noise transition estimation, instance-dependent noise, clean sample selection, etc.), and transfer learning (joint importance-predictor estimation for covariate shift adaptation, dynamic importance estimation for full distribution shift, continuous distribution shift, etc.).

Wednesday, April 26, 2023

Title. The Hidden Convex Optimization Landscape of Deep Neural Networks

Speaker: Mert Pilanci

Stanford University

Abstract. Since deep neural network training problems are inherently non-convex, their recent dramatic success largely relies on non-convex optimization heuristics and experimental findings. Despite significant advancements, the non-convex nature of neural network training poses two central challenges: first, understanding the underlying mechanisms that contribute to model performance, and second, achieving efficient training with low computational cost and energy consumption. The performance of non-convex models is notably influenced by the selection of optimization methods and hyperparameters, including initialization, mini-batching, and step sizes. Conversely, convex optimization problems are characterized by their robustness to these choices, allowing for the efficient and consistent achievement of globally optimal solutions, irrespective of optimization parameters.

In this talk, we explore a novel perspective by examining multilayer neural networks equipped with ReLU activation functions through the framework of convex optimization. We introduce exact convex optimization formulations of ReLU network training problems. We show that two-layer ReLU networks can be globally trained via convex programs with the number of variables polynomial in the number of training samples, feature dimension, and the number of hidden neurons. We show that our analysis extends to deeper networks and that these convex programs possess an intuitive geometric interpretation. Our results provide an equivalent characterization of neural networks as convex models where a mixture of locally linear models are fitted to the data with sparsity inducing convex regularization. Moreover, we show that standard convolutional neural networks can be globally optimized in fully polynomial time. We discuss extensions to batch normalization, generative adversarial networks and transformers. Finally, we present numerical simulations verifying our claims and illustrating that the proposed convex approach is faster and more reliable than standard local search heuristics such as SGD and variants.

Wednesday, April 26, 2023

Title. Sums of squares: from algebra to analysis

Speaker: Francis Bach

INRIA, ENS, and PSL Paris

Abstract. The representation of non-negative functions as sums of squares has become an important tool in many modeling and optimization tasks. Traditionally applied to polynomial functions, it requires rich tools from algebraic geometry that led to many developments in the last twenty years. In this talk, I will look at this problem from a functional analysis point of view, leading to new applications and new results on the performance of sum-of-squares optimization.

Wednesday, April 12, 2023

Title. Geometry and convergence of natural policy gradient methods

Speaker: Guido F. Montufar

University of California, Los Angeles

Abstract. We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength. This is work with Johannes Müller.

Wednesday, March 22, 2023

Title. Flat minima generalize for low-rank matrix recovery

Speaker: Dmitriy Drusvyatskiy

University of Washington

Abstract. Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.

Thursday, March 2, 2023

Title. Finding a solution to the spherical perceptron problem

Speaker: Ahmed El Alaoui

Cornell University

Abstract. We consider the spherical perceptron problem: this is the problem of finding a point on the sphere subject to many random linear inequalities. This problem and its variants appear in combinatorics, in machine learning as models for one-layer neural networks, and as analytically tractable models for the study of critical properties of jamming of hard spheres. The structure of solutions has been studied in statistical physics, but still poorly understood from the rigorous mathematical point of view.

In this talk I will review the relevant literature and will discuss an algorithmic approach attempting to compute a valid solution up to the satisfiability threshold predicted by statistical physics. Time permitting, I will discuss the case of the binary perceptron, where the desired solution must have binary coordinates.

Wednesday, Feb 22, 2023

Title. Performative Prediction

Celestine Mendler-Dünner Max Planck Institute for Intelligent Systems

Abstract. When predictions power societal systems they may influence the population they aim to predict. We call such predictions performative; the prediction influences the predicted. In this talk, I will introduce a risk minimization framework, called performative prediction, that conceptualizes this phenomenon by allowing the predictive model to influence the distribution over future data. This inherently dynamic problem formulation brings forth interesting new challenges for optimization and algorithm design. I will cover three emerging results on the dynamics of repeated risk minimization, the nuances of stochastic learning, and the application of techniques from bandit optimization. To conclude the talk I will provide a discussion on power and normative considerations of optimization in performative environments.

Wednesday, Feb 1, 2023 at 4pm ET

Title. The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

Mufan (Bill) Li and Daniel Roy University of Toronto

Abstract. The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

Wednesday, Dec 7, 2022 at 4pm ET

Title: Machine Learning and Strategic Agents: Dynamics and Algorithm Design

Eric Mazumdar, Caltech

Abstract: Machine learning algorithms are increasingly being deployed into environments in which they must contend with other strategic agents with potentially misaligned objectives. The presence of these other agents breaks many of the underlying assumptions on which machine learning algorithms are built and can cause non-stationarity in the environment that can give rise to surprising dynamics and behaviors. In this talk, we will explore the challenges and opportunities available to algorithm designers in such scenarios and show how one can take advantage of the game theoretic interactions in the environment to give performance and convergence guarantees to game theoretically meaningful solutions.

In particular, we will focus on two sets of problems: 1. a dynamic model of strategic classification in which a learner seeks to train a machine learning in the presence of adaptive agents and 2. matching markets where individual agents attempt to learn their most preferred match while competing with other agents over firms. In strategic classification we will show that the presence of strategic agents means that the learning rate or speed of learning becomes a design choice that can be used to select for different equilibria. In matching markets we will design a family of algorithms that allow agents to optimally learn in structured matching markets while competing with other agents even without full observation of the market. Both of these results suggest that dealing with game theoretic interactions requires re-evaluating long-standing assumptions underlying in machine learning, but that if one can leverage the underlying game theoretic structure, one can still give strong performance guarantees.

Bio: Eric Mazumdar is an Assistant Professor in Computing and Mathematical Sciences and Economics at Caltech. He obtained his Ph.D in Electrical Engineering and Computer Science at UC Berkeley where he was advised by Michael Jordan and Shankar Sastry. Eric was previously a Research Fellow at the Simons Institute for Theory of Computing. Prior to Berkeley, he received a B.S. in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT). His research lies at the intersection of machine learning and economics where he is broadly interested in developing the tools and understanding necessary to confidently deploy machine learning algorithms into societal-scale systems. He applies his work to problems in intelligent infrastructure, online markets, e-commerce, and the delivery of healthcare.

Wednesday, Nov 16, 2022 at 4pm ET

Title: Equiangular lines and eigenvalue multiplicities

Yufei Zhao, MIT

Abstract: Equiangular lines are configurations of lines in n-dimensional space, all passing through the origin, that pairwise make the same angle.

We solve the following longstanding problem: fix an angle, in high dimensions, what is the maximum number of equiangular lines pairwise separated by the given angle?

A central step is a new result in spectral graph theory: the adjacency matrix of a connected bounded degree graph has sublinear second eigenvalue multiplicity. This result is proved via counting closed walks, with a key new idea that deleting a net of vertices depresses local spectral radii. It remains an intriguing open problem to determine the maximum possible second eigenvalue multiplicity.

Bio: Yufei Zhao is Associate Professor of Mathematics at MIT. His research tackles a broad range of problems in discrete mathematics, including extremal, probabilistic, and additive combinatorics, graph theory, and discrete geometry, as well as applications to computer science. His honors include the SIAM Dénes Kőnig prize (2018), the Sloan Research Fellowship (2019), and the NSF CAREER Award (2021).

Wednesday, Nov 9, 2022 at 3pm ET

Title: Leveraging "partial" smoothness for faster convergence in nonsmooth optimization

Damek Davis, Cornell University

Abstract: First-order methods in nonsmooth optimization are often described as "slow." I will present two (locally) accelerated first-order methods that violate this perception: a superlinearly convergent method for solving nonsmooth equations, and a linearly convergent method for solving "generic" nonsmooth optimization problems. The key insight in both cases is that nonsmooth functions are often "partially" smooth in useful ways.

Bio: Damek Davis is an Associate Professor of Operations Research at Cornell University. His research focuses on the interplay of optimization, signal processing, statistics, and machine learning. He has received several awards for his work, including a Sloan Research Fellowship in Mathematics (2020), the INFORMS Optimization Society Young Researchers Prize (2019), and an NSF CAREER Award (2021).

Recording Unavailable. Papers covered in talk:

A superlinearly convergent subgradient method for sharp semismooth problems

Vasileios Charisopoulos, Damek Davis

Manuscript (2021) [code]

A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth

Damek Davis, Liwei Jiang

Manuscript (2022)

Subgradient methods near active manifolds: saddle point avoidance, local convergence, and asymptotic normality

Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

Manuscript (2021)

Wednesday, Oct 26, 2022 at 4pm ET

Title: The power of adaptivity in representation learning: From meta-learning to federated learning

Aryan Mokhtari, University of Texas at Austin

Abstract: A central problem in machine learning is as follows: How should we train models using data generated from a collection of clients/environments, if we know that these models will be deployed in a new and unseen environment? In the setting of few-shot learning, two prominent approaches are: (a) develop a modeling framework that is “primed” to adapt, such as Model Adaptive Meta Learning (MAML), or (b) develop a common model using federated learning (such as FedAvg), and then fine tune the model for the deployment environment. We study both these approaches in the multi-task linear representation setting. We show that the reason behind generalizability of the models in new environments trained through either of these approaches is that the dynamics of training induces the models to evolve toward the common data representation among the clients’ tasks. In both cases, the structure of the bi-level update at each iteration (an inner and outer update with MAML, and a local and global update with FedAvg) holds the key — the diversity among client data distributions are exploited via inner/local updates, and induces the outer/global updates to bring the representation closer to the ground-truth. In both these settings, these are the first results that formally show representation learning, and derive exponentially fast convergence to the ground-truth representation. Based on joint work with Liam Collins, Hamed Hassani, Sewoong Oh, and Sanjay Shakkottai. Papers: https://arxiv.org/abs/2202.03483 , https://arxiv.org/abs/2205.13692

Bio: Aryan Mokhtari is an Assistant Professor in the Electrical and Computer Engineering Department of the University of Texas at Austin (UT Austin) where he holds the Fellow of Texas Instruments/Kilby. Before joining UT Austin, he was a Postdoctoral Associate in the Laboratory for Information and Decision Systems (LIDS) at MIT. Prior to that, he was a Research Fellow at the Simons Institute for the program on “Bridging Continuous and Discrete Optimization”. He received his Ph.D. in electrical and systems engineering from the University of Pennsylvania (Penn). He is the recipient of the Army Research Office (ARO) Early Career Program Award, the Simons-Berkeley Research Fellowship, and Penn’s Joseph and Rosaline Wolf Award for Best Doctoral Dissertation.

Wednesday, Oct 12, 2022 at 4pm ET

Title: Three Facets of Understanding Pre-training: Loss, Inductive Bias, and Implicit Bias

Tengyu Ma, Stanford

Abstract: AI is undergoing a paradigm shift with the rise of models pre-trained with self-supervisions and then adapted to a wide range of downstream tasks. However, their working largely remains a mystery; classical learning theory cannot explain why pre-training on an unsupervised task can help many different downstream tasks. This talk will first investigate the role of pre-training losses in extracting meaningful structural information from unlabeled data, especially in the infinite data regime. Concretely, I will show that the contrastive loss can give rise to embeddings whose Euclidean distance captures the manifold distance between raw data (or, more generally, the graph distance of a so-called positive-pair graph). Moreover, directions in the embedding space correspond to relationships between clusters in the positive-pair graph. Then, I will discuss two other elements that seem necessary for a sharp explanation of the behavior of practical pre-trained models: inductive bias of architectures and implicit bias of optimizers. I will introduce two recent, ongoing projects, where we (1) strengthen the previous theoretical framework by incorporating the inductive bias of architectures and (2) demonstrate the implicit bias of optimizers in pre-training, even with infinite pre-training data, empirically and theoretically.

Bio: Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of the ACM Doctoral Dissertation Award Honorable Mention, the Sloan Fellowship, and the NSF CAREER Award.

Wednesday, May 11, 2022 at 2pm ET

Title: Constant Regret in Online Decision-Making

Siddhartha Banerjee, Cornell

Abstract: I will present a class of finite-horizon control problems, where we see a random stream of arrivals, need to select actions in each step, and where the final objective depends only on the aggregate type-action counts; this includes many widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks. For such settings, I will introduce a unified algorithmic paradigm, and provide a simple yet general condition under which these algorithms achieve constant regret, i.e., additive loss compared to the hindsight optimal solution which is independent of the horizon and state-space. These results stem from an elementary coupling argument, which may prove useful for many other questions in online decision-making. Time permitting, I will illustrate this by showing how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.

Bio: Sid Banerjee is an associate professor in the School of Operations Research at Cornell, working on topics at the intersection of data-driven decision-making, network algorithms and market design. His research is supported by grants from the NSF (including an NSF CAREER award), the ARL Network Sciences division, and Engaged Cornell, and has received multiple awards including the INFORMS Applied Probability Society Best Publication award in 2021. He completed his PhD from the ECE Department at UT Austin, and was a postdoctoral researcher in the Social Algorithms Lab at Stanford. He also served as a technical consultant with the research science group at Lyft from 2014-18.

Wednesday, April 27th, 2022 at 2pm ET

Title: Equilibrium Computation, Deep Multi-Agent Learning, and Multi-Agent Reinforcement Learning

Constantinos Daskalakis, MIT

Wednesday, March 30, 2022 at 2pm ET

Title: Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Nicolas Flammarion, EPFL

Abstract: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrized neural networks. In this talk, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous-time version, namely stochastic gradient flow. We explicitly characterize the solution chosen by the stochastic flow and prove that it always enjoys better generalization properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. Our findings highlight the fact that structured noise can induce better generalization and they help to explain the greater performances observed in practice of stochastic gradient descent over gradient descent.

Bio: Nicolas Flammarion is a tenure-track assistant professor in computer science at EPFL. Prior to that, he was a postdoctoral fellow at UC Berkeley, hosted by Michael I. Jordan. He received his Ph.D. in 2017 from Ecole Normale Superieure in Paris, where he was advised by Alexandre d’Aspremont and Francis Bach. His research focuses primarily on learning problems at the interface of machine learning, statistics, and optimization.

Friday 18th, 2022 at 1pm ET

Title: Sampling Under Symmetry

Nisheeth Vishnoi, Yale University

Abstract: Exponential densities on orbits of Lie groups such as the unitary group are endowed with surprisingly rich mathematical structure and, traditionally, arise in diverse areas of physics, random matrix theory, and statistics.

In this talk, I will discuss how the symmetries of Lie groups can be leveraged to design efficient algorithms for sampling from such distributions leading to applications in quantum inference and differential privacy.

This talk will require no background in Lie theory and should be broadly accessible.

Bio: Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes studying entropy, the emergence of intelligent behavior, and ethical problems at the interface of artificial intelligence and society.

He was the recipient of the Best Paper Award at IEEE Symposium on Foundations of Computer Science in 2005, the IBM Research Pat Goldberg Memorial Award in 2006, the Indian National Science Academy Young Scientist Award in 2011, the IIT Bombay Young Alumni Achievers Award in 2016, and the Best Paper award at ACM Conference on Fairness, Accountability, and Transparency in 2019. He was named an ACM Fellow in 2019. His most recent book ``Algorithms for Convex Optimization'' was published by Cambridge University Press..

Wednesday 2nd, 2022 at 2pm ET

Title: Information-directed Exploration in Bandits and Reinforcement Learning

Andreas Krause, ETH Zürich

Abstract: The exploration—exploitation dilemma is a central challenge when making decisions under uncertainty. Most common approaches explore by favoring actions with uncertain outcomes. However, aleatoric uncertainty in the outcomes is different from epistemic uncertainty in the estimation task, thus the resulting observations may not necessarily be informative. In this talk, I will present approaches towards efficient information-directed exploration in stochastic multi-armed bandits, Bayesian optimization, reinforcement learning and a rich family of sequential decision problems called partial monitoring. These approaches use information measures for guiding exploration, and their submodularity allows to establish sublinear regret even in non-parametric settings. I will present the theoretical background, as well as empirical demonstrations on deep reinforcement learning tasks.

Bio: Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center and Chair of the ETH AI Center, and co-founded the ETH spin-off LatticeFlow. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Max Planck Fellow at the Max Planck Institute for Intelligent Systems, an ELLIS Fellow, a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received the Rössler Prize, ERC Starting Investigator and ERC Consolidator grants, the German Pattern Recognition Award, an NSF CAREER award as well as the ETH Golden Owl teaching award. His research has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019 and the ICML Test of Time award 2020.

November 17th, 2021 at 2pm EDT

Title: Machine Learning for Integer Programming: Methods, Challenges, and Prospects

Elias Khalil, University of Toronto

Abstract: Mixed Integer Programming (MIP) is one of the most widely used frameworks for modeling real-world decision-making tasks. MIP solvers, which are based on a branch-and-bound algorithm, are capable of solving large-scale problems much faster than what worst-case complexity bounds may suggest. This surprising fact has been attributed to algorithmic and engineering enhancements to MIP solvers (rather than just better hardware), accumulated over the last three decades.

Along with the increase in the complexity of MIP algorithms, there is also an increase in the amount of "data" that is generated by the repeated solution of similar instances of the same mathematical optimization problem. This creates an opportunity for leveraging Machine Learning (ML) to "optimize the optimizers".

In this talk, I will give an overview of research developments at the intersection of ML and MIP, with a focus on two recent works on (1) learning to schedule heuristics in a MIP solver [NeurIPS-21, https://openreview.net/forum?id=fEImgFxKU63] and (2) Monte Carlo Tree Search to find backdoor sets, a handful of integer variables such that the MIP instance can be solved to global optimality by branching only on those variables [https://arxiv.org/abs/2110.08423].

Bio: Elias B. Khalil [https://ekhalil.com/] is the Scale AI Research Chair in Data-Driven Algorithms for Modern Supply Chains, an Assistant Professor of Industrial Engineering at the University of Toronto, and a Faculty Affiliate of the Vector Institute. Prior to that, he was the IVADO Postdoctoral Scholar at Polytechnique Montréal. Elias obtained his Ph.D. from the College of Computing at Georgia Tech where he was an IBM Ph.D. Fellow in 2016. His research interests are in the integration of machine learning and discrete optimization to enable fast and optimal decision-making for supply chain planning and healthcare resource management.

November 10th, 2021 at 2pm EDT

Title: Tilted Losses in Machine Learning: Theory and Applications

Virginia Smith, Carnegie Melon University

Abstract: Exponential tilting is a technique commonly used to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this talk, I discuss a simple extension to ERM---tilted empirical risk minimization (TERM)---which uses tilting to flexibly tune the impact of individual losses. I make connections between TERM and related approaches, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO), and present batch and stochastic first-order optimization methods for solving TERM at scale. Finally, I show that this baseline can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance---delivering state-of-the-art performance relative to more complex, bespoke solutions for these problems.

Bio: Virginia Smith is an assistant professor in the Machine Learning Department at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia’s current work addresses challenges related to optimization, privacy, fairness, and robustness in distributed settings in order to make federated learning safe, efficient, and reliable. Virginia’s work has been recognized by several awards, including an MIT TR35 Innovator Award, Facebook Faculty Award, and Google Research Awards. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley.

October 13th, 2021 at 2pm EDT

Title: The implicit bias of perceptron, coordinate descent, gradient descent, deep learning, and the actor-critic algorithm.

Matus Telgarsky, University of Illinois, Urbana-Champaign

Abstract: The talk starts by surveying the implicit bias of standard descent methods, with some emphasis on proof schemes. The remainder of the talk will focus on an implicit bias, specifically an alignment phenomenon, in deep learning and in actor-critic algorithms.

Joint work with Yuzheng Hu and Ziwei Ji.

Bio: Matus Telgarsky is an assistant professor at the University of Illinois, Urbana-Champaign, specializing in deep learning theory. He was fortunate to receive a PhD at UCSD under Sanjoy Dasgupta. Other highlights include: co-founding, in 2017, the Midwest ML Symposium (MMLS) with Po-Ling Loh; receiving a 2018 NSF CAREER award; co-organizing a Simons Insititute program on deep learning theory.

September 29, 2021 at 2pm EDT

Title: The Role of Complexity Bounds in Optimization

Stephen Wright, University of Wisconsin-Madison

Abstract: Complexity analysis in optimization seeks upper bounds on the amount of work required to find approximate solutions of problems in a given class with a given algorithm, and also lower bounds, usually in the form of a worst-case example from a given problem class as regards the work required by a particular class of algorithms. The relationship between theoretical complexity bounds and practical performance of algorithms on “typical” problems varies widely across problem and algorithm classes, and relative interest among researchers between these two aspects of algorithm design and analysis has waxed and waned over the years. This talk surveys complexity analysis and its relationship to practice in optimization, with an emphasis on linear programming and convex and nonconvex nonlinear optimization, providing historical (and cultural) perspectives on research in these areas.

Bio: Stephen J. Wright holds the George B. Dantzig Professorship, the Sheldon Lubar Chair, and the Amar and Balinder Sohi Professorship of Computer Sciences at the University of Wisconsin-Madison. His research is in computational optimization and its applications to data science and many other areas of science and engineering. Prior to joining UW-Madison in 2001, Wright held positions at North Carolina State University (1986-1990) and Argonne National Laboratory (1990-2001). He has served as Chair of the Mathematical Optimization Society (2007-2010) and as a Trustee of SIAM for the maximum three terms (2005-2014). He is a Fellow of SIAM. In 2014, he won the W.R.G. Baker Award from IEEE for best paper in an IEEE archival publication during the three years 2009-2011. He was awarded the Khachiyan Prize by the INFORMS Optimization Society in 2020 for lifetime achievements in optimization, and received the NeurIPS Test of Time Award in 2020 for a paper presented at that conference in 2011.

Prof. Wright is the author / coauthor of widely used text and reference books in optimization including "Primal Dual Interior-Point Methods" and "Numerical Optimization". He has published widely on optimization theory, algorithms, software, and applications.

Prof. Wright served from 2014-2019 as Editor-in-Chief of the SIAM Journal on Optimization and previously served as Editor-in-Chief of Mathematical Programming Series B. He has also served as Associate Editor of Mathematical Programming Series A, SIAM Review, SIAM Journal on Scientific Computing, and several other journals and book series.

September 15, 2021 at 2pm EDT

Title: Faster Empirical Risk Minimization

Jelena Diakonikolas, University of Wisconsin–Madison

Abstract: Empirical Risk Minimization (ERM) problems are central to machine learning, and their efficient optimization has been studied from different perspectives, often taking advantage of the finite sum structure present in typical problem formulations. In particular, tight oracle complexity bounds have been obtained under fairly general assumptions about the loss functions. In this talk, I will present a rather surprising and general result that takes advantage of the separability of nonsmooth convex loss functions with efficiently computable proximal operators -- such as, e.g., the hinge loss and the sum of absolute errors -- to obtain an algorithm that exhibits significantly lower complexity than what is predicted by the lower bounds for general nonsmooth convex losses. Crucial to this result is the use of proximal operators, which goes beyond the simple gradient oracle access to the function. The talk is based on joint work with Chaobing Song and Stephen Wright.

Bio: Jelena Diakonikolas is an assistant professor in the Department of Computer Sciences at UW-Madison. Prior to joining UW-Madison, she held postdoctoral positions at UC Berkeley and Boston University. She received her PhD from Columbia University. Her research interests are in efficient optimization methods for large scale problems and connections between optimization and other areas such as dynamical systems and Markov Chain Monte Carlo methods. Jelena is a recipient of a Simons-Berkeley Research Fellowship, the Morton B. Friedman Prize for Excellence at Columbia Engineering, and a Qualcomm Innovation Fellowship.

May 5th, 2021 at 11am EDT

Title: Consensus-based optimization

Massimo Fornasier, Technical University of Munich

Abstract: Consensus-based optimization (CBO) is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. In fact, optimizing agents (particles) move on the optimization domain driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace’s principle, and it represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached the stochastic component vanishes. Based on an experimentally supported intuition that CBO always performs a gradient descent of the squared Euclidean distance to the global minimizer, we show a novel technique for proving the global convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we present the proof that CBO performs a convexification of a very large class of optimization problems as the number of optimizing agents goes to infinity. We further present formulations of CBO over compact hypersurfaces and the proof of convergence to global minimizers for nonconvex nonsmooth optimizations on the hypersphere. We conclude the talk with several numerical experiments, which show that CBO scales well with the dimension and is extremely versatile. To quantify the performances of such a novel approach, we show that CBO is able to perform essentially as good as ad hoc state of the art methods using higher order information in challenging problems in signal processing and machine learning, namely the phase retrieval problem and the robust subspace detection.

Keywords: high-dimensional optimization, derivative-free optimization, geometric optimization, consensus-based optimization, Fokker-Planck equations, signal processing and machine learning

References:

https://arxiv.org/pdf/2103.15130.pdf https://arxiv.org/pdf/2001.11994.pdf https://arxiv.org/pdf/2001.11988.pdf https://arxiv.org/pdf/2104.00420.pdf

April 21st, 2021 at 11am EDT

Title: Beyond open loop thinking: A prelude to learning-based intelligent systems

Lillian J. Ratliff, University of Washington

Abstract: Learning algorithms are increasingly being deployed in a variety of real world systems. A central tenet of present day machine learning is that when it is arduous to model a phenomenon, observations thereof are representative samples from some, perhaps unknown, static or otherwise independent distribution. In the context of systems such as civil infrastructure and other services (e.g., online marketplaces) dependent on its use, there are two central challenges that call into question the integrity of this tenet. First, (supervised) algorithms tend to be trained on past data without considering that the output of the algorithm may change the environment, and hence the data distribution. That is, the algorithm may be trained on data that is biased towards the past, and not representative of the future environment in which the algorithm is to be deployed. Second, data used to either train the algorithms offline or as an input to an online decision-making algorithm may be generated by strategic data sources such as human users. Indeed, such data depends on how the algorithm impacts a user’s individual objectives or (perceived) quality of service. This again leads to the underlying data distribution being dependent on the output of the algorithm. This begs the question of how learning algorithms can and should be designed taking into consideration this closed-loop interaction with the environment in which it will be deployed. In this talk, I will provide one perspective on designing and analyzing algorithms by modeling the underlying learning task in the language of game theory and control, and using tools from these domains to provide performance guarantees. Recent, promising results in this direction will be highlighted. Time permitting, open questions will be posed towards the end of the talk.

Bio: Lillian J. Ratliff is an Assistant Professor in the Department of Electrical Engineering at the University of Washington. She also holds an Adjunct Professor position in the Allen School of Computer Science and Engineering at UW. Prior to joining UW she was a postdoctoral researcher in EECS at UC Berkeley (2015-2016) where she also obtained her PhD (2015) under the advisement of Shankar Sastry. She has an MS (UNLV 2010) and BS (UNLV 2008) in Electrical Engineering as well as a BS (UNLV 2008) in Mathematics. Lillian’s research interests lie at the intersection of game theory, learning, and optimization. She draws on theory from these areas to develop analysis tools for studying algorithmic competition, cooperation and collusion and synthesis tools for designing algorithms with performance guarantees. In addition, she is interested in developing new theoretical models of human decision-making in consideration of behavioral factors in societal-scale systems (e.g., intelligent infrastructure, platform-based markets and e-commerce, etc.) and computational schemes to shape the outcome of competitive interactions. Lillian is the recipient of an NSF Graduate Research Fellowship (2009), NSF CISE Research Initiation Initiative award (2017), and an NSF CAREER award (2019), and the ONR Young Investigator award (2020). Lillian was also an invited speaker at the NAE China-America Frontiers of Engineering Symposium (2019) and recently awarded the Dhanani Endowed Faculty Fellowship (2020).

April 7th, 2021 at 11am EDT

Title: Stochastic Line Search and Martingales

Katya Scheinberg, Cornell University

Abstract: We will present a general framework which models standard methods such as line search and trust region methods in a stochastic setting via a stochastic process and its stopping time. We will show how this framework models some variants of stochastic line search and how analysing the stopping time gives us a high probability bound (and bound in expectation) on complexity of the line search. This framework provides strong convergence analysis under weaker conditions than alternative approaches in the literature.

Bio: Katya Scheinberg is a Professor and Director of Graduate Studies at the School of Operations Research and Information Engineering at Cornell University. Prior to joining Cornell she was the Harvey E. Wagner Endowed Chair Professor at the Industrial and Systems Engineering Department at Lehigh University. She attended Moscow University for her undergraduate studies and received her PhD degree from Columbia University. She has worked at the IBM T.J. Watson Research Center as a research staff member for over a decade before joining Lehigh in 2010. Katya’s main research areas are related to developing practical algorithms (and their theoretical analysis) for various problems in continuous optimization, such as convex optimization, derivative free optimization, machine learning, quadratic programming, etc. In 2015, jointly with Andy Conn and Luis Vicente, she received the Lagrange Prize awarded jointly by SIAM and MOS. In 2019 she was awarded the Farkas Prize by Informs Optimization Society. Katya is currently the editor-in-chief of Mathematics of Operations Research, and a co-editor of Mathematical Programming

March 24th, 2021 at 11am EDT

Title: Scalability improvements for Semidefinite Programming

Georgina Hall, INSEAD

Abstract: Many intractable problems that appear in machine learning can be approximately solved using semidefinite programs (SDPs). Though SDPs provide good-quality bounds on the initial problem, it is well known that, as a class of problems, they are considerably limited by their scalability. In this talk, I review recent advances that aim to make semidefinite programming more scalable, with a specific focus on approaches that approximate SDPs with cheaper conic programs, such as linear and second-order cone programs.

March 10th, 2021 at 11am EST

Title: A law of robustness for two-layers neural networks

Sébastien Bubeck, Microsoft Research

Abstract: I will present a mathematical conjecture potentially establishing overparametrization as a law of robustness for neural networks. I will tell you some of the things that we already know about this conjecture. Time-permitting I will include a discussion of how to think about various quantities for higher order tensors (their rank, the relation between spectral norm and nuclear norm, and concentration for random tensors). Joint work with Yuanzhi Li and Dheeraj Nagaraj https://arxiv.org/abs/2009.14444

February 24th, 2021 at 11am EST

Title: Is gradient descent doing gradient DESCENT in deep learning?

Yuanzhi Li, Carnegie Mellon University

Abstract: Gradient descent and its variants are the most widely applied methods to train large scale neural networks in practice. On the theory side, traditional optimization theory typically attributes the success of gradient descent to the simple "gradient descent lemma": using a proper learning rate (typically smaller than the inverse local smoothness of the objective function), following the gradient direction ensures that the objective decreases monotonically. This work argues that gradient descent in deep learning operates on a completely different regime called "the edge of stability": We empirically show that using standard learning rates, gradient descent on deep learning objectives quickly enters a region where the local smoothness of the loss function hovers right above 2/(learning rate) -- After that, the loss starts to change largely non-monotonically, although still decreasing in the long run. Our empirical observation applied when using gradient descent to train standard convolution nets (with/without BN) on image data sets, transformer model on language modeling or even simple deep linear networks.

We also give a proof of this phenomenon in a special case of two-layer neural network. We will also discuss about the edge of stability of stochastic gradient descent and momentum methods.

February 10th, 2021 at 11am EST

Title: Complexity of Distributed and Federated Stochastic Optimization

Nati (Nathan) Srebro, Toyota Technological Institute at Chicago

Abstract: Stochastic convex optimization in a standard, sequential, setting, has been very well understood for a while now: we know how to formalize the problem, what the optimal (min max) complexity is, and we have methods, namely accelerated stochastic gradient descent and its variants, which attain this complexity, and are therefore “optimal”. However, understanding the complexity of stochastic optimization, or equivalently learning, in a distributed setting, remains open. In this talk I will discuss different distributed and federated optimization settings, where we stand in understanding their complexity, and what methods might be optimal. In particular, I will present new results tightly characterizing the min-max complexity, and identifying an optimal method, for the most basic “intermittent communication” model.

Based on joint work with Blake Woodworth and many others.

Bio: Nati (Nathan) Srebro is a professor at the Toyota Technological Institute at Chicago, with cross-appointments at the University of Chicago Dept. of Computer Science and Committee on Computational and Applied Mathematics. He obtained his PhD at the Massachusetts Institute of Technology (MIT) in 2004, and previously was a post-doctoral fellow at the University of Toronto, a Visiting Scientist at IBM, and an Associate Professor at the Technion. Prof. Srebro’s research encompasses methodological, statistical and computational aspects of Machine Learning, as well as related problems in Optimization. Some of Prof. Srebro’s significant contributions include work on learning “wider” Markov networks; introducing the use of the nuclear norm for machine learning and matrix reconstruction; work on fast optimization techniques for machine learning, the optimality of stochastic methods, and on the relationship between learning and optimization more broadly. His current interests include understanding deep learning through a detailed understanding of optimization; distributed and federated learning; algorithmic fairness and practical adaptive data analysis. Prof. Srebro is currently on sabbatical visiting EPFL in Lausanne.

December 1st, 2020 at 4pm ET

Title: Fitting Tractable Convex Sets to Support Function Data

Venkat Chandrasekaran, California Institute of Technology

Abstract: The geometric problem of estimating an unknown compact convex set from evaluations of its support function arises in a range of scientific and engineering applications. Traditional approaches typically rely on estimators that minimize the error over all possible compact convex sets; in particular, these methods do not allow for much incorporation of prior structural information about the underlying set and the resulting estimates become increasingly more complicated to describe as the number of measurements available grows. We address both of these shortcomings by describing a framework for estimating tractably specified convex sets from support function evaluations. Building on the literature in convex optimization, our approach is based on estimators that minimize the error over structured families of convex sets that are specified as linear images of concisely described sets – such as the simplex or the spectraplex – in a higher-dimensional space that is not much larger than the ambient space. Convex sets parametrized in this manner are significant from a computational perspective as one can optimize linear functionals over such sets efficiently; they serve a different purpose in the inferential context of the present topic, namely, that of incorporating regularization in the reconstruction while still offering considerable expressive power. We provide a geometric characterization of the asymptotic behavior of our estimators, with the analysis relying on the property that certain sets which admit semialgebraic descriptions are Vapnik-Chervonenkis (VC) classes. Our numerical experiments highlight the utility of our framework over previous approaches in settings in which the measurements available are noisy or small in number as well as those in which the underlying set to be reconstructed is non-polyhedral.

(Joint work with Yong Sheng Soh.)

Bio: Venkat Chandrasekaran is a Professor at Caltech in Computing and Mathematical Sciences and in Electrical Engineering. He received a Ph.D. in Electrical Engineering and Computer Science from MIT (2011), and he received undergraduate degrees in Mathematics as well as in Electrical and Computer Engineering from Rice University (2005). He was awarded the Jin-Au Kong Dissertation Prize for the best doctoral thesis in Electrical Engineering at MIT (2012), the Young Researcher Prize in Continuous Optimization (at the 4th International Conference on Continuous Optimization of the Mathematical Optimization Society, 2013), the Sloan Research Fellowship in Mathematics (2016), and the INFORMS Optimization Society Prize for Young Researchers (2016). He is currently serving on the editorial boards of the Annals of Statistics, the SIAM Journal on the Mathematics of Data Science, and the SIAM Journal on Applied Algebra and Geometry. His research interests lie in mathematical optimization and its interface with topics in the information sciences.

GMT20201201-210728_OPTML---Se_3840x2160.mp4

November 24th, 2020 at 4pm ET

Title: Optimality in Optimization

John Duchi, Stanford University

Abstract: I will discuss what it means for a method to be optimal in optimization and machine learning. When a method matches a lower bound, does that mean the method is good? How can we develop lower bounds and optimality results that are meaningful? Can theoretical results actually direct progress in what we do? Some of this talk will be speculative, some will cover my and others results, and some will likely be polemic.

Bio: John Duchi is an assistant professor of Statistics and Electrical Engineering and (by courtesy) Computer Science at Stanford University. His work spans statistical learning, optimization, information theory, and computation, with a few driving goals. (1) To discover statistical learning procedures that optimally trade between real-world resources---computation, communication, privacy provided to study participants---while maintaining statistical efficiency. (2) To build efficient large-scale optimization methods that address the spectrum of optimization, machine learning, and data analysis problems we face, allowing us to move beyond bespoke solutions to methods that robustly work. (3) To develop tools to assess and guarantee the validity of---and confidence we should have in---machine-learned systems.

He has won several awards and fellowships. His paper awards include the SIAM SIGEST award for "an outstanding paper of general interest" and best papers at the Neural Information Processing Systems conference, the International Conference on Machine Learning, and an INFORMS Applied Probability Society Best Student Paper Award (as advisor). He has also received the Society for Industrial and Applied Mathematics (SIAM) Early Career Prize in Optimization, an Office of Naval Research (ONR) Young Investigator Award, an NSF CAREER award, a Sloan Fellowship in Mathematics, the Okawa Foundation Award, the Association for Computing Machinery (ACM) Doctoral Dissertation Award (honorable mention), and U.C. Berkeley's C.V. Ramamoorthy Distinguished Research Award.

john_trimmed.mp4

November 17th, 2020 at 10am ET (note the change in time)

Title: Elephant in the Room: Non-Smooth Non-Convex Optimization

Ohad Shamir, Weizmann Institute of Science

Abstract: It is well-known that finding global minima of non-convex optimization problems is computationally hard in general. However, the problem of finding local minima-like points (at least in terms of gradient and Hessian properties) is tractable, and received much attention in recent years. The resulting literature has been largely motivated by the rising importance of non-convex problems such as deep learning, but in fact, does not quite address them: Nearly all positive results in this area require the objective function to be smooth, which is seldom satisfied in deep learning problems. This highlights the importance of understanding what we can do efficiently on non-convex, non-smooth optimization problems. In this talk, I will describe some results, challenges, and possible approaches to tackle this fundamental question.

Bio: Ohad Shamir is a faculty member at the Department of Computer Science and Applied Mathematics at the Weizmann Institute. He received his PhD in 2010 at the Hebrew University, and between 2010-2013 and 2017-2018 was a researcher at Microsoft Research New England. His work focuses on theoretical machine learning, in areas such as deep learning theory, the intersection of machine learning and optimization, and learning with information and communication constraints. He received several awards, and served as program co-chair of COLT as well as a member of its steering committee.

GMT20201117-145434_OPTML---Se_1920x1080.mp4

November 10th, 2020 at 4pm ET

Title: Robustness in Machine Learning and Optimization: A Minmax Approach

Asu Ozdaglar, Massachusetts Institute of Technology

Abstract: Minmax problems arise in a large number of problems in optimization, including worst-case design, duality theory, and zero-sum games, but also have become popular in machine learning in the context of adversarial robustness and Generative Adversarial Networks (GANs). This talk will review our recent work on solving minmax problems using discrete-time gradient based optimization algorithms. We focus on Optimistic Gradient Descent Ascent (OGDA) and Extra-gradient (EG) methods, which have attracted much attention in the recent literature because of their superior empirical performance in GAN training. We show that OGDA and EG can be seen as approximations of the classical proximal point method and use this interpretation to establish convergence rate guarantees for these algorithms. These guarantees are provided for the ergodic (averaged) iterates of the algorithms. We also consider the last iterate of EG and present convergence rate guarantees for the last iterate for smooth convex-concave saddle point problems. We finally turn to analysis of generalization properties of gradient based minmax algorithms using the algorithmic stability framework defined by Bousquet and Elisseeff. Our generalization analysis suggests superiority of gradient descent ascent (GDA) compared to GDmax algorithm (which involves exact solution of the maximization problem at each iteration) in the nonconvex-concave case provided that similar learning rates are used in the descent and ascent steps.

Bio: Asu Ozdaglar received the B.S. degree from the Middle East Technical University in 1996, and the S.M. and Ph.D. degrees from the Massachusetts Institute of Technology (MIT), in 1998 and 2003, respectively.

She’s the Mathworks Professor of Electrical Engineering and Computer Science in the EECS Department at the MIT. She’s the Department Head of EECS and the Deputy Dean of Academics in the Schwarzman College of Computing. Her research expertise includes optimization theory, distributed optimization and control, and network analysis.

Her awards include a Microsoft fellowship, NSF Career award, 2008 Donald P. Eckman award of the American Automatic Control Council, Class of 1943 Career Development Chair, inaugural Steven and Renee Innovation Fellowship, and 2014 Spira teaching award.

GMT20201110-210620_OPTML---Se_1920x1050.mp4

November 3rd, 2020 at 4pm ET

Title: Heavy-tail Phenomenon in SGD

Mert Gurbuzbalaban, Rutgers University

Abstract: In the first part of the talk, we focus on the gradient noise (GN) in the stochastic gradient descent (SGD) algorithm for training deep neural networks (DNNs). In this setting, GN is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. We conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. By exploiting connections to Levy-driven differential equations and their metastability properties, our results open up a different perspective and shed more light on the belief that SGD prefers wide minima. Joint work with Umut Simsekli and Levent Sagun.

In the second part of the talk, we question the origins of heavy tails in SGD iterations further and their link to various notions of capacity and complexity that have been proposed for characterizing the generalization properties of SGD in deep learning. Some of the popular notions that correlate well with the performance on unseen data are (i) the ‘flatness’ of the local minimum found by SGD, which is related to the eigenvalues of the Hessian, (ii) the ratio of the stepsize η to the batch size b, which essentially controls the magnitude of the stochastic gradient noise, and (iii) the ‘tail-index’, which measures the heaviness of the tails of the eigenspectra of the network weights. We argue that these three seemingly unrelated perspectives for generalization are deeply linked to each other. We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters η and b, the SGD iterates will converge to a heavy-tailed stationary distribution. We rigorously prove this claim in the setting of linear regression: we show that even in a simple quadratic optimization problem with independent and identically distributed Gaussian data, the iterates can be heavy-tailed with infinite variance. We further characterize the behavior of the tails with respect to algorithm parameters, the dimension, and the curvature. We then translate our results into insights about the behavior of SGD in deep learning. We finally support our theory with experiments conducted on both synthetic data and neural networks. Joint work with Umut Simsekli and Lingjiong Zhu.

In the third part of the talk, if time permits, we will talk about a new class of initialization schemes for fully-connected neural networks that can improve the performance of SGD training by inducing a particular form of heavy-tailed behavior in the stochastic gradients. Joint work with Yuanhan Hu.

Bio: Mert Gürbüzbalaban is an assistant professor at Rutgers University. Previously, he was a postdoctoral associate at the Laboratory for Information and Decision Systems (LIDS) at MIT. He is broadly interested in optimization and computational science driven by applications in large-scale information and decision systems. He received his B.Sc. degrees in Electrical Engineering and Mathematics as a valedictorian from Boğaziçi University, Istanbul, Turkey, the “Diplôme d’ingénieur” degree from École Polytechnique, France, and the M.S. and Ph.D. degrees in (Applied) Mathematics from the Courant Institute of Mathematical Sciences, New York University.

Dr. Gürbüzbalaban is a co-recipient of the Honorable Mention for the Best Paper Award at the International Conference in Machine Learning (ICML 2019). He also received the Kurt Friedrichs Prize (given by the Courant Institute of New York University for an outstanding thesis) in 2013, Bronze Medal in the École Polytechnique Scientific Project Competition in 2006, the Nadir Orhan Bengisu Award (given by the electrical-electronics engineering department of Boğaziçi University to the best graduating undergraduate student) in 2005.

GMT20201103-210636_OPTML---Se_1600x800.mp4

October 20th, 2020 at 11am EDT

Title: Dimensionality reduction techniques for large-scale optimization problems

Coralia Cartis, Oxford University

Abstract: Known by many names, sketching techniques allow random projections of data from high to low dimensions while preserving pairwise distances. This talk explores ways to use sketching so as to improve the scalability of algorithms for diverse classes of optimization problems and applications, from linear to nonlinear, local to global, derivative-based to derivative-free. Numerical illustrations/results will also be presented.

Bio: Coralia Cartis is an Associate Professor in Numerical Optimisation in the Mathematical Institute, University of Oxford since 2013, and a Turing fellow at the Alan Turing Institute for Data Science since 2016; previously, she held academic and research positions at University of Edinburgh and Rutherford Appleton Laboratory. She holds a PhD degree from Cambridge University (supervisor: Prof Mike Powell) and a BSc in Mathematics from Babesh-Bolyai University, Cluj, Romania. Her research interests are in nonlinear optimisation algorithm analysis and implementation, especially complexity/global rates of convergence, and in diverse applications of optimisation from climate modelling to signal processing and machine learning.

October 13th, 2020 at 4pm EDT

Title: Two Facets of Learning Robust Models: Fundamental Limits and Generalization to Natural Out-of-Distribution Inputs

Hamed Hassani, University of Pennsylvania

Abstract: In this talk, we will focus on the recently-emerged field of (adversarially) robust learning. The field began by the observation that modern learning models, despite the breakthrough performance, remain fragile to seemingly innocuous changes in the data such as small, norm-bounded perturbations of the input data. In response, various training methodologies have been developed for enhancing robustness. However, it is fair to say that our understanding in this field is still at its infancy and several key questions remain widely open. We will consider two such questions.

(1) Fundamental limits: It has been repeatedly observed that improving robustness to perturbed inputs (robust accuracy) comes at the cost of decreasing the accuracy on benign inputs (standard accuracy), leading to a fundamental tradeoff between these often competing objectives. Complicating matters further, recent empirical evidence suggests that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In the first part of the talk, we will develop a precise and comprehensive understanding of such tradeoffs in the context of the simple yet foundational problem of linear regression.

(2) Robustness to other types of out-of-distribution inputs: There are other sources of fragility for deep learning that are arguably more common and less studied. Indeed, natural variation such as lighting or weather conditions or device imperfections can significantly degrade the accuracy of trained neural networks, proving that such natural variation presents a significant challenge. To this end, in the second part of the talk we propose a paradigm shift from perturbation-based adversarial robustness toward a new framework called "model-based robust deep learning". Using this framework, we will provide general training algorithms that improve the robustness of neural networks against natural variation in data. We will show the success of this framework to improve robustness of modern learning models consistently against many types of natural out-of-distribution inputs and across a variety of commonly-used datasets.

Bio: Hamed Hassani is currently an assistant professor of Electrical and Systems Engineering department at the University of Pennsylvania. Prior to that, he was a research fellow at Simons Institute for the Theory of Computing (UC Berkeley) affiliated with the program of Foundations of Machine Learning, and a post-doctoral researcher in the Institute of Machine Learning at ETH Zurich. He received a Ph.D. degree in Computer and Communication Sciences from EPFL, Lausanne. He is the recipient of the 2014 IEEE Information Theory Society Thomas M. Cover Dissertation Award, 2015 IEEE International Symposium on Information Theory Student Paper Award, 2017 Simons-Berkeley Fellowship, 2018 NSF-CRII Research Initiative Award, 2020 Air Force Office of Scientific Research (AFOSR) Young Investigator Award, and 2020 National Science Foundation (NSF) CAREER Award.

GMT20201013-200507_OPTML---Se_1760x900(1).mp4

October 06th, 2020 at 4pm EDT

Title: Big Data is Low Rank

Madeleine Udell, Cornell University

Abstract: Data scientists are often faced with the challenge of understanding a high dimensional data set organized as a table. These tables may have columns of different (sometimes, non-numeric) types, and often have many missing entries. In this talk, we discuss how to use low rank models to analyze these big messy data sets.

Low rank models perform well --- indeed, suspiciously well --- across a wide range of data science applications, including applications in social science, medicine, and machine learning. In this talk, we introduce the mathematics of low rank models, demonstrate a few surprising applications of low rank models in data science, and present a simple mathematical explanation for their effectiveness.

Bio: Madeleine Udell is Assistant Professor of Operations Research and Information Engineering and Richard and Sybil Smith Sesquicentennial Fellow at Cornell University. She studies optimization and machine learning for large scale data analysis and control, with applications in marketing, demographic modeling, medical informatics, engineering system design, and automated machine learning. She has received several awards, including an NSF CAREER award (2020), an Office of Naval Research (ONR) Young Investigator Award (2020), and an INFORMS Optimization Society Best Student Paper Award (as advisor) (2019). Madeleine completed her PhD at Stanford University in Computational & Mathematical Engineering in 2015 under the supervision of Stephen Boyd, and a one year postdoctoral fellowship at Caltech in the Center for the Mathematics of Information hosted by Professor Joel Tropp.

optml-2.mp4

September 29th, 2020 at 4pm EDT

Title: Reducing Isotropy to KLS, a faster sampling algorithm

Yin-Tat Lee, University of Washington

Paper: https://arxiv.org/pdf/2008.02146.pdf

Abstract: Sampling a point from a convex body is a problem more general than convex optimization. In this talk, we will give a new algorithm that takes O~(n^3) time (under the KLS conjecture) and O~(n^3.5) time (unconditionally). This is the first general improvement on the Lovász-Vempala O~(n^4) algorithm from 2003. Joint work with He Jia, Aditi Laddha, and Santosh Vempala.

Bio: Yin Tat Lee is a Paul G. Allen endowed assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. He completed his Ph.D. at Massachusetts Institute of Technology and his undergraduate studies at the Chinese University of Hong Kong. His research interests are primarily in algorithms and they span a wide range of topics such as convex optimization, convex geometry, spectral graph theory, and online algorithms.

Over the past few years, he combined ideas from continuous and discrete mathematics to substantially advance the state-of-the-art algorithms for solving many fundamental problems in computer science and optimization, such as linear programming and the maximum flow problem. As a result, he has received a variety of awards for his work, including Best Paper Award and 2 x Best Student Paper Awards at FOCS, Best Paper Award at SODA, Best Paper Award at NeurIPS, Sprowls Award, NSF CAREER Award, A.W. Tucker Prize, Microsoft Research Faculty Fellowship, and Sloan Research Fellowship.