Fall 2018

Fall 2018: Every Monday 4-5pm in MSB1147 (unless otherwise specified)

Organizer for Fall 2018: Shiqian Ma (Email: sqma@math.ucdavis.edu)

Oct 1 (1:45-5:30pm in MSB1147): Opening event. See more detailed information here.

1:45pm: Reception

2-2:50 Talk by Jorge Nocedal (Northwestern). Title: Nonlinear Optimization Methods for Machine Learning .

3-3:50 Talk by Andrea Montanari (Stanford) . Title: A Mean Field View of the Landscape of Two-Layers Neural Networks.

3:50-4:30 coffee/break

4:30-5:30 Talk by Joel Tropp (Caltech). Title: Applied Random Matrix Theory.

Oct 8: Ery Arias-Castro (UC San Diego) . Title: On using graph distances to estimate Euclidean and related distances.

Abstract: Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances. The talk will include a partial review of the literature, and then present more recent developments on the estimation of curvature-constrained distances on a surface, as well as on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph. (Joint work with Thibaut Le Gouic, Antoine Channarond, Bruno Pelletier, and Nicolas Verzelen.)

Oct 15: Aaron Sidford (Stanford) . Title: Perron-Frobenius Theory in Nearly Linear Time.

Abstract: The Perron-Frobenius theorem of O. Perron (1907) and G. Frobenius (1912) is a fundamental linear algebraic that has had far reaching implications over the past century. It lies at the heart of numerous computational task including computing the stationary distribution of a Markov chain, solving linear systems in M-matrices, computing personalized PageRank, computing Katz centrality, and evaluating the utility of policies in Markov decision process. However, despite the ubiquity of these problems and their extensive study, the fastest known running times for all of these problems either depended either polynomially on the desired accuracy or conditioning of the problem or appealed to generic linear system solving machinery, and therefore ran in super-quadratic time.

In this talk I will present recent results showing that these problems can be solved to high precision in nearly linear time. I will present new iterative methods for extracting structure from directed graphs and and show how they can be tailored to computing Perron vector's and solving M-matrices in nearly linear time. Moreover, I will discuss connections between this work and recent developments in solving Laplacian systems and emerging trends in combining continuous and combinatorial techniques in the design of optimization algorithms.

Oct 22: Lihong Li (Google) . Title: Reinforcement Learning as Saddle-Point Optimization.

Abstract: When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding (SBEED), to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm’s sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems. (Joint work with Bo Dai et al., presented at ICML 2018.)

Oct 29: Sayan Mukherjee (Duke) . Title: How Many Directions Determine a Shape.

Abstract: We study transformations of shapes into representations that allow for analysis using standard statistical tools. The transformations are based on Euler integration and are of interest for their mathematical properties as well as their applications to science and engineering, because they provide a way of summarizing shapes in a topological, yet quantitative, way. By using an inversion theorem, we show that both transforms are injective on the space of shapes---each shape has a unique transform. By making use of a stratified space structure on the sphere, induced by hyperplane divisions, we prove additional uniqueness results in terms of distributions on the space of Euler curves. The main theoretical result provides the first (to our knowledge) finite bound required to specify any shape in certain uncountable families of shapes, bounded below by curvature. This result is perhaps best appreciated in terms of shattering number or the perspective that any point in these particular moduli spaces of shapes is indexed using a tree of finite depth. We also show how these transformations can be used in practice for medical imaging applications as well as for evolutionary morphology questions.

Oct 29 (MSB 2112, 2:10-3pm): Bradley M. Bell (University of Washington). Title: Algorithmic Differentiation in Theory and in Practice. Note: This is not a MADDD seminar, but it should interest some of the audience of MADDD seminar. For more information, see: https://www.math.ucdavis.edu/research/seminars/?talk_id=5359

Nov 5: Madeleine Udell (Cornell) . Title: Big Data is Low Rank.

Abstract: Matrices of low rank are pervasive in big data, appearing in recommender systems, movie preferences, topic models, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention on explaining why low rank structure appears in the first place. In this talk, we explain the abundance of low rank matrices in big data by proving that certain latent variable models associated to piecewise analytic functions are of log-rank. Any large matrix from such a latent variable model can be approximated, up to a small error, by a low rank matrix. Armed with this theorem, we show how to use a low rank modeling framework to exploit low rank structure even for datasets that are not numeric, with applications in the social sciences, medicine, retail, and machine learning.

Nov 12: No seminar. University Holiday.

Nov 19: CANCELLED. Will be rescheduled. Jelena Diakonikolas (UC Berkeley). Title: Invariance in First-Order Convex Optimization.

Abstract: First-order methods with optimal worst-case iteration complexities for various classes of convex optimization problems have been known for decades. However, their standard analysis bears little intuition, which makes their generalization to new problems arising in modern data analysis applications challenging. In this talk, I will present a novel general and unifying framework for the analysis of first-order methods. The framework relies on the construction of a primal-dual gap and can be likened to Lagrangian mechanics: continuous-time methods and their convergence analysis follow directly from enforcing a certain invariance. The discrete-time methods can then be obtained by using different methods of discretization. This stands in contrast with standard approaches, which can be thought of as being analogous to Newtonian mechanics, where an action (i.e., a predetermined optimization method) is applied to a system and then its behavior (convergence) analyzed. To illustrate the power and generality of our framework, I will show that it recovers a wide range of known first-order methods in a unifying manner, and that it also yields novel methods that outperform existing ones on problems arising in machine learning applications. Based on joint work with Lorenzo Orecchia.

Nov 26: CANCELLED. Boaz Nadler (Weizmann Institute of Science)

Dec 3: Anna Ma (UC San Diego). Title: Variants of the Randomized Kaczmarz Algorithms and its Applications.

Abstract: Nowadays, data is exploding at a faster rate than computer architectures can handle. For that reason, mathematical techniques to analyze large-scale data need be developed. Stochastic iterative algorithms have gained interest due to their low memory footprint and adaptability for large-scale data. In this talk, we will study the Randomized Kaczmarz algorithm for solving extremely large linear systems of the form Ax=y. In the spirit of large-scale data, this talk will act under the assumption that the entire data matrix A cannot be loaded into memory in a single instance. We consider different settings including when a only factorization of A is available, when x is sparse, and a time-varying model. We will also present applications of these Kaczmarz variants to problems in data science.

Dec 10: Paul Grigas (UC Berkeley) . Title: Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods.

Abstract: Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. First-order methods are the algorithms of choice for solving large-scale instances of the logistic regression problem. We introduce a pair of condition numbers that measure the degree of non-separability or separability of a given dataset in the setting of binary classification, and we study how these condition numbers relate to and inform the properties and the convergence guarantees of first-order methods. When the training data is non-separable, we show that the degree of non-separability naturally enters the analysis and informs the properties and convergence guarantees of two standard first-order methods: steepest descent (for any given norm) and stochastic gradient descent. Expanding on the work of Bach, we also show how the degree of non-separability enters into the analysis of linear convergence of steepest descent (without needing strong convexity), as well as the adaptive convergence of stochastic gradient descent. When the training data is separable, first-order methods rather curiously have good empirical success, which is not well understood in theory. In the case of separable data, we demonstrate how the degree of separability enters into the analysis of l_2 steepest descent and stochastic gradient descent for delivering approximate-maximum-margin solutions with associated computational guarantees as well. This suggests that first-order methods can lead to statistically meaningful solutions in the separable case, even though the ML solution does not exist. This is joint work with Robert Freund and Rahul Mazumder.