Fall 2019

Organizer for Fall 2019: Shiqian Ma (Email: sqma@ucdavis.edu)

Oct 1: Yong Jae Lee (UC Davis, Computer Science). Title: Learning to Understand Visual Data with Minimal Human Supervision.

Abstract:

Humans and animals learn to see the world mostly on their own, without supervision, yet today’s state-of-the-art visual recognition systems rely on millions of manually-annotated training images. This reliance on labeled data has become one of the key bottlenecks in creating systems that can attain a human-level understanding of the vast concepts and complexities of our visual world. Indeed, while computer vision research has made tremendous progress, most success stories are limited to specific domains in which lots of carefully-labeled data can be unambiguously and easily acquired.

In this talk, I will present my research in computer vision and deep learning on creating scalable recognition systems that can learn to understand visual data with minimal human supervision. Given the right constraints, I’ll show that one can design learning algorithms that discover and generate meaningful patterns from the data with little to no human supervision. In particular, I’ll focus on algorithms that can localize relevant image regions given only weak image-level supervision, and hierarchically disentangle and generate fine-grained details of objects. I'll also discuss our latest work on real-time instance segmentation -- our algorithm obtains ~30 mAP on the challenging MS COCO dataset while running at 33 fps on a single Titan Xp.

10/08: 3:10-4pm in MSB2112 (note the time and room change). Speaker: Facundo Memoli (Ohio State University).

Title: Gromov-Wasserstein distances and distributional invariants of datasets

Abstract: The Gromov-Wasserstein (GW) distance is a generalization of the standard Wasserstein distance between two probability measures on a given ambient metric space. The GW distance assumes that these two probability measures might live on different ambient spaces and therefore implements an actual comparison of pairs of metric measure spaces. Metric-measure spaces are triples (X,dX,muX) where (X,dX) is a metric space and muX is a Borel probability measure over X and serve as a model for datasets.

In practical applications, this distance is estimated either directly via gradient based optimization approaches, or through the computation of lower bounds which arise from distributional invariants of metric-measure spaces. One particular such invariant is the so called ‘global distance distribution’ of pairwise distances.

This talk will overview the construction of the GW distance, the stability of distribution based invariants, and will discuss some results regarding the injectivity of the global distribution of distances for smooth planar curves, hypersurfaces, and metric trees.

10/10: 2:10-3pm (note the time change) in MSB1147. Speaker: Xin Liu (UC Davis, Computer Science).

Title: Machine Learning Meets Cellular Networks

Abstract: Machine learning has drawn much recent attention in its ability to model, predict, and control network performance and user experience, based on network measurement and user performance data. In this talk, we introduce both applied and theoretical aspects of machine learning algorithms in networks. First, we discuss our recent work on learning-based cellular resource management, using the example of collaborative-learning-based cellular tower parameter configuration. Second, we present our proposed opportunistic learning algorithms for multi-armed bandits and episodic reinforcement learning that are inspired by real-world applications of learning algorithms in networks.

10/15: Meisam Razaviyayn (USC, Industrial and Systems Engineering)

Title: Learning via Non-Convex Min-Max Games

Abstract: Recent applications that arise in machine learning have surged significant interest in solving min-max saddle point games. This problem has been extensively studied in the convex-concave regime for which a global equilibrium solution can be computed efficiently. In this talk, we study the problem in the non-convex regime and show that an $\epsilon$--first order stationary point of the game can be computed when one of the player’s objective can be optimized to global optimality efficiently. We discuss the application of the proposed algorithm in defense agains adversarial attacks to neural networks, generative adversarial networks, fair learning, and generative adversarial imitation learning.

10/22: Yuwei Fan (Stanford, Math)

Title: Deep Neural Networks for PDEs

Abstract: Recently, deep neural networks (DNNs) have been increasingly used in the context of scientific computing, particularly in solving PDE-related problems. In this talk, we first constructed a series novel neural network architectures inspired by classical linear algebra algorithms, including the hierarchical matrices, the hierarchical nested bases and BCR's nonstandard wavelet form. The new architectures inherit the multiscale structure of these classical algorithms, thus called multiscale neural network. Then we apply the neural networks to solve classical PDE-based inverse problems, for example, electrical impedance tomography (EIT) and optical tomography (OT). The key of the application on the inverse problem is how to represent the properties of the inverse map by neural network.

10/29: Yu Zhang (UC Santa Cruz, Electrical and Computer Engineering)

Title: Improving Power Grid Reliability and Efficiency via Advanced Signal Processing and Machine Learning

Abstract: In this talk, we will first focus on the non-convex power flow and power system state estimation problems, which play a central role in monitoring and operation of electric power grids. The power flow problem aims at obtaining all voltage phasors from a set of noiseless observations, whereas the state estimation deals with noisy measurements. The semidefinite programming (SDP) and second-order cone programming (SOCP) relaxations are leveraged to cope with the inherent non-convexity of the two problems. It is shown that both conic relaxations recover the true power flow solution under mild conditions. A penalized SDP is then designed for the state estimation task. We drive an upper bound to quantify the optimal solution of the SDP, which is shown to possess a dominant rank-one component formed by lifting the true voltage vector. In the second part of the talk, we will quickly demonstrate recent results for electricity market inference and energy disaggregation solved by advanced machine learning techniques.

11/05: Erika Roldan Roa (Ohio State U, Math)

Title: Evolution of the homology and related geometric properties of the Eden Growth Model.

Abstract: In this talk, we study the persistent homology and related geometric properties of the evolution in time of a discrete-time stochastic process defined on the 2-dimensional regular square lattice. This process corresponds to a cell growth model called the Eden Growth Model (EGM). It can be described as follows: start with the cell square of the 2-dimensional regular square lattice of the plane that contains the origin; then make the cell structure grow by adding one cell at each time uniformly random to the perimeter. We give a characterization of the possible change in the rank of the first homology group of this process (the "number of holes"). Based on this result we have designed and implemented a new algorithm that computes the persistent homology associated to this stochastic process and that also keeps track of geometric features related to the homology. Also, we present obtained results of computational experiments performed with this algorithm, and we establish conjectures about the asymptotic behavior of the homology and other related geometric random variables. The EGM can be seen as a First Passage Percolation model after a proper time-scaling. This is the first time that tools and techniques from stochastic topology and topological data analysis are used to measure the evolution of the topology of the EGM and in general in FPP models.

11/12: Rishidev Chaudhuri (UC Davis, Math and NPB)

Title: Finding and using sparse networks in neural computation

Abstract: Computation in the brain is distributed over very large networks of neurons, yet most of these neurons are not directly connected to each other. This sparse connectivity is efficient, contributing to the brain's remarkably low energy use, and network density is disrupted in a number of disease states. Thus, understanding the principles that underlie computation on these sparse networks is of broad theoretical and practical significance.

In the first part of the talk, I will present work studying how the brain might decide which network connections are important and which are redundant (and hence can be pruned to maintain network sparsity). Noise is ubiquitous in neural systems and often considered an irritant to be overcome, but I will suggest that the brain may use noise to probe the structure of the network and remove redundant connections. I will construct a noise-driven plasticity rule that operates on a dynamical network and prove that for an important class of linear networks it preserves useful properties of the original dynamics. I will also show that this noise-driven pruning strategy has a number of connections to the sparsification of graph Laplacians.

Time permitting, in the second part of the talk I will discuss a model for sparse signal recovery by sparse random connections in biological neural networks and suggest an architecture for working memory that utilizes this sparse signal recovery.

11/19: James Sharpnack (UC Davis, Stats)

Title: Weak consistency of the 1-nearest neighbor measure with applications to missing data

Abstract: When data is partially missing at random, imputation and importance weighting are often used to estimate moments of the unobserved population. In this paper, we study 1-nearest neighbor (1NN) importance weighting, which estimates moments by replacing missing data with the complete data that is the nearest neighbor in the non-missing covariate space. We define an empirical measure, the 1NN measure, and show that it is weakly consistent for the measure of the missing data. The main idea behind this result is that the 1NN measure is performing inverse probability weighting in the limit. We study applications to missing data and mitigating the impact of covariate shift in prediction tasks.

11/26: Javad Lavaei (Berkeley, IEOR)

Title: Computational Techniques for Nonlinear Optimization and Learning Problems.

Abstract: The design of efficient computational techniques for nonlinear optimization has been an active area of research in the field of optimization theory for many years, but has recently been shaping the mathematical foundation of data science. In this walk, we investigate different strategies for finding a high-quality solution of a nonlinear optimization or learning problem. To this end, we develop key notions such as penalized semi-definite programs (SDPs), spurious local trajectory and graphical mutual incoherence and also design algorithms with near-linear time/memory complexity for sparse SDPs with localized constraints to break down the complexity of SDPs and make them as usable as linear programs. Finally, we offer several case studies on real-world systems such as power grids and transportations.

12/03: Nhat Ho (Berkeley, EECS)

Title: Statistical and computational perspective of mixture and hierarchical models

Abstract: The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Mixture and hierarchical models provide a principled approach to modeling heterogeneous collections of data. However, it has been observed that statistical estimation methods, such as maximum likelihood estimator (MLE), or optimization techniques, such as expectation-maximization (EM), have non-standard statistical rates of convergence. In this talk, we provide new insight into these issues of mixture and hierarchical models.

From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions. Reposing on this framework, we develop a novel post-processing procedure, named Merge-Truncate-Merge algorithm, to determine the true number of components in mixture models.

From the computational side, we study the EM algorithm under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. In such settings, it is known that a global maximum based on $n$ samples has a non-standard rate of convergence. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance $O((d/n)^{1/4})$ from the true parameter after $O((n/d)^{1/2})$ steps where d is the dimension. Analysis of this singular case requires the introduction of some novel analysis techniques, in particular we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.

This talk features joint work with (alphabetically) Raaz Dwivedi, Aritra Guha, Michael Jordan, Koulik Khamaru, Long Nguyen, Ya’acov Ritov, Martin Wainwright, and Bin Yu.