MADDD Seminar - Fall 2020

MADDD:

Mathematics of Data and Decisions at Davis

Aims: The MADDD seminar aims to gather researchers on all kinds of cutting-edge algorithmic and mathematical techniques used when analyzing and visualizing data, and then making optimal decisions. Thus talks and discussions on the seminar, most often directly motivated by applications, will include (but will not be limited to): Optimization, Applied Computational Harmonic Analysis, Topological Data Analysis, Game Theory, Information theory, Theory of Machine Learning Algorithms, Computational Geometry, Inverse problems, Shape optimization problems, Scheduling problems, Packing, Risk assessment models, Control theory, Computer vision problems, Statistical signal and image processing, Pattern Recognition problems, Clustering and classification, Data Compression, Discrete Mathematics, Numerical Linear and Tensor Algebra, Random matrix theory, Convex and Functional analysis, etc.

Below is the MADDD seminar program for Fall 2020:

Organizer for Fall 2020: Naoki Saito (Email: saito@math.ucdavis.edu).

Due to the COVID-19 pandemic we have decided to move the seminar to be fully online. Our seminar runs every Tuesday 4:10-5pm (unless otherwise specified) via a ZOOM meeting:

Meeting ID: 960 0033 4551

The password is the same as in Spring 2020. If you do not recall the password, you can obtain it from the organizer by sending your request via your UCD email account. If you are not associated with UC Davis, then send your request via email to the organizer, using your university or company emails (no gmail) and include a short comment about who you are and why you want to join MADDD.

Note that some of the presentations have been recorded and are available at AggieVideo. In addition, some of the presentation slides are available at the website of the organizer.

Previous programs: Opening Event Fall 2018 Winter 2019 Spring 2019 Fall 2019 Winter 2020 Spring 2020

09/29: Stefan Schonsheck (Mathematics, UC Davis)

Title: Coherence and Manifold Structures in Data

Abstract: Both recent and long term advances in data acquisition and storage technology have led to the genesis of the so-called ’big data era’. The proliferation of social and scientific databases of increasing size has lead to a need for algorithms that can efficiently process, analyze, and, even generate this data. However, due to the large number of observations (volume), and the number of variables observed (dimension), many classical approaches from traditional signal processing and statistics are not computationally feasible in this regime. The field of Geometric Data Processing has been developed as a way to exploit inherent coherence in data to design new algorithms based on motivation from both differential and discrete geometry.In this talk, we will explore techniques for exploiting the geometric structure of data to analyze and generate geometric, as well as general data. First, we will use ideas from parallel transport on manifolds to generalize convolution and convolutional neural networks to deformable manifolds. We will show that this allows us to analyze signals based on the intrinsic dimension, and separate intrinsic and extrinsic information which can be usefull for analysis and generation of novel synthetic data. Afterwards, we will conclude by proposing a novel auto-regressive model for capturing the intrinsic geometry and topology of data which also allows us to devleop univeral generation theorems for neural networks.

10/06: Abhishek Roy (Statistics, UC Davis)

Title: Escaping Saddle-Point Faster under Interpolation-like Conditions

Abstract: In this work, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of perfectly interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients under over-parametrization, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $\epsilon$-local-minimizer, matches the corresponding deterministic rate of $\epsilon^{-2}$. We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions. The first and second-order oracle complexities of SCRN to reach an -local-minimizer turns out to be $\epsilon^{-2.5}$. The obtained complexity is better than the corresponding complexity of either PSGD or SCRN without interpolation-like assumptions. However, it does not match the rate of $\epsilon^{-1.5}$corresponding to deterministic Cubic-Regularized Newton method. We also discuss the corresponding improved complexities in the zeroth-order settings.

10/13: Chao Wang (ECE/CS, UC Davis)

Title: From telescope to computed tomography via sparse recovery approaches

Abstract: In this presentation, I will talk about two kinds of two applications: 3D localization of space debris and limited-angle computed tomography reconstruction. These seemingly unrelated topics are connected and addressed by sparse recovery approaches. In the first application, we consider the high-resolution imaging problem of 3-dimensional (3D) point source image recovery from 2-dimensional data using a method based on point spread function (PSF) engineering. A new non-convex regularization method with a data-fitting term based on Kullback--Leibler (KL) divergence is proposed for 3D localization for the Poisson noise model. Besides, we further study the 3D localization and material classification of unresolved space debris using a multispectral rotating PSF by proposing a three-stage algorithm scheme. In the second application, we consider minimizing the L1/L2 term on the gradient for a limited-angle scanning problem in computed tomography (CT) reconstruction. We design a specific splitting framework for an unconstrained optimization model so that the alternating direction method of multipliers (ADMM) has guaranteed convergence under certain conditions. At last, I will talk about the prospect of the on-going projects at UCD.

10/20: Cynthia Rudin (CS, Duke)

Title: Current Approaches in Interpretable Machine Learning

Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice, flawed models in healthcare, and black box loan decisions in finance. Transparency and interpretability of machine learning models is critical in high stakes decisions. In this talk, I will focus on two of the most fundamental and important problems in the field of interpretable machine learning: optimal sparse decision trees and optimal scoring systems. I will also briefly describe work on interpretable neural networks for computer vision.

Optimal sparse decision trees: We want to find trees that maximize accuracy and minimize the number of leaves in the tree (sparsity). This is an NP hard optimization problem with no polynomial time approximation. I will present the first practical algorithm for solving this problem, which uses a highly customized dynamic-programming-with-bounds procedure, computational reuse, specialized data structures, analytical bounds, and bit-vector computations.

Optimal scoring systems: Scoring systems are sparse linear models with integer coefficients. Traditionally, scoring systems have been designed using manual feature elimination on logistic regression models, with a post-processing step where coefficients have been rounded. However, this process can fail badly to produce optimal (or near optimal) solutions. I will present a novel cutting plane method for producing scoring systems from data. The solutions are globally optimal according to the logistic loss, regularized by the number of terms (sparsity), with coefficients constrained to be integers. Predictive models from our algorithm have been used for many medical and criminal justice applications, including in intensive care units in hospitals.

Interpretable neural networks for computer vision: We have developed a neural network that performs case-based reasoning. It aims to explains its reasoning process in a way that humans can understand, even for complex classification tasks such as bird identification.

Papers:

Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer: Generalized and Scalable Optimal Sparse Decision Trees. ICML, 2020.

Berk Ustun and Cynthia Rudin: Learning Optimized Risk Scores. JMLR, 2019. Shorter version at KDD 2017.

Struck et al.: Association of an Electroencephalography-Based Risk Score With Seizure Probability in Hospitalized Patients. JAMA Neurology, 2017.

Chaofan Chen, Oscar Li, Chaofan Tao, Alina Barnett, Jonathan Su, Cynthia Rudin: This Looks Like That: Deep Learning for Interpretable Image Recognition. NeurIPS, 2019.

10/27: Patrice Koehl (CS, UC Davis)

Title: Light speed computation of exact solutions to generic and to degenerate assignment problems

Abstract: The linear assignment problem is a fundamental problem in combinatorial optimization with a wide range of applications, from operational research to data sciences. It consists of assigning ``agents" to ``tasks" on a one-to-one basis, while minimizing the total cost associated with the assignment. While many exact algorithms have been developed to identify such an optimal assignment, most of these methods are computationally prohibitive for large size problems. In this talk, I will describe a novel approach to solving the assignment problem using techniques adapted from statistical physics. In particular I will derive a strongly concave effective free energy function that captures the constraints of the assignment problem at a finite temperature. This free energy decreases monotonically as a function of $\beta$, the inverse of temperature, to the optimal assignment cost, providing a robust framework for temperature annealing. For large enough $\beta$ values the exact solution to the generic assignment problem can be derived using a simple round-off to the nearest integer of the elements of the computed assignment matrix. I will also describe a provably convergent method to handle degenerate assignment problems. Finally, I will describe computer implementations of this framework that are optimized for parallel architectures, one based on CPU, the other based on GPU. These implementations enable solving large assignment problems (of the orders of a few 10000s) in computing clock times of the orders of minutes.

11/03: Patrick Flaherty (Math/Stat, UMass)

Title: MAP Clustering under the Gaussian Mixture Model via Mixed Integer Programming

Abstract: In the application of clustering models to real data there is often rich prior information that constrains the relationships among the samples, or the relationships between the samples and the parameters. For example, in biological or clinical experiments, it may be known that two samples are technical replicates and should be assigned to the same cluster, or it may be known that the mean value for control samples is in a certain range. However, standard model-based clustering methods make it difficult to enforce such hard logical constraints and may fail to provide a globally optimal clustering. We present a global optimization approach for solving the maximum a-posteriori (MAP) clustering problem under the Gaussian mixture model. Our approach can accommodate side constraints and preserves the combinatorial structure of the MAP clustering problem by its formulation as a mixed-integer nonlinear optimization problem (MINLP). We approximate the MINLP through a mixed-integer quadratic program (MIQP) transformation that improves computational aspects while guaranteeing $\epsilon$-global optimality. An important benefit of our approach is the explicit quantification of the degree of suboptimality, via the optimality gap, en route to finding the globally optimal MAP clustering. Numerical experiments comparing our method to other approaches show that our method finds better optima than standard clustering methods. Finally, we cluster a real breast cancer gene expression data set incorporating intrinsic subtype information the induced constraints substantially improve the computational performance and produce more coherent and biologically meaningful clusters.

11/10: Zhi Ding (ECE, UC Davis)

Title: Deep Learning: Not a Simple Hammer for Massive MIMO Wireless Communication Systems

Abstract: The proliferation of advanced wireless services, such as virtual reality, autonomous driving and internet of things has generated increasingly intense pressure to develop intelligent wireless communication systems to meet networking needs posed by extremely high data rates, massive number of connected devices, and ultra low latency. Deep learning (DL) has been recently emerged as an exciting design tool to advance the development of wireless communication system with some demonstrated successes. In this talk, we introduce the principles of applying DL for improving wireless network performance by integrating the underlying characteristics of channels in practical massive MIMO deployment. We develop important insights derived from the physical RF channel properties and present a comprehensive overview on the application of DL for accurately estimating channel state information (CSI) of forward channels with low feedback overhead. We provide examples of successful DL application in CSI estimation for massive MIMO wireless systems and highlight several promising directions for future research.

11/17: Chelsea Weaver (Amazon)

Title: Natural Language Understanding at Amazon Music

Abstract: In this talk, I’ll discuss what happens when you ask an Alexa device to play music. I’ll focus on the Natural Language Understanding (NLU) component, which deals with categorizing and labeling transcribed requests. In particular, I’ll discuss two projects I’ve worked on designed to improve upon the initial labeling. The first uses a BERT-based language model to “correct” requests that appear to be mislabeled. The second is an online learning model that selects from different NLU interpretations using implicit customer feedback. I’ll conclude the talk with a few tips for the industry job search.

Links: Amazon Jobs Page; Amazon Science Page

Papers:

Personalizing natural-language understanding using multi-armed bandits and implicit feedback – Moerchen et al (2020)

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback - Swaminathan & Joachims (2015)

Analysis of Thompson Sampling for the Multi-Armed Bandit Problem – Agrawal et al (2012)

11/24: Wolfgang Polonik (Stat, UC Davis)

Title: Multiscale Geometric Feature Extraction

Abstract: A method for extracting multiscale geometric features from a data cloud is presented. Each pair of data points is mapped into a real-valued feature function, whose construction is based on geometric considerations. The collection of these feature functions is then being used for further data analysis. Applications include classification, anomaly detection and data visualization. In contrast to the popular kernel trick, the construction of the feature functions is based on geometric considerations. The performance of the methodology is illustrated through applications to real data sets, and some theoretical guarantees supporting the performance of the novel methodology are presented. This is joint work with G. Chandler.

12/01: Guido Montufar (Mathematics, UCLA)

Title: Optimal Transport to Independence Models

Abstract: An independence model for discrete random variables is a Segre-Veronese variety in a probability simplex. Any metric on the set of joint states of the random variables induces a Wasserstein metric on the probability simplex. The unit ball of this polyhedral norm is dual to the Lipschitz polytope. Given any data distribution, we seek to minimize its Wasserstein distance to a fixed independence model. The solution to this optimization problem is a piecewise algebraic function of the data. We compute this function explicitly in small instances, we examine its combinatorial structure and algebraic degrees in the general case, and we present some experimental case studies. This talk is based on joint work with Türkü Özlüm Çelik, Asgar Jamneshan, Bernd Sturmfels, Lorenzo Venturello.

https://arxiv.org/abs/1909.11716

https://arxiv.org/abs/2003.06725

12/08: Samir Chowdhury (Psychiatry and Behavioral Sciences, Stanford)

Title: Gromov-Wasserstein Learning in a Riemannian Framework

Abstract: Geometric and topological data analysis methods are increasingly being used to derive insights from data arising in the empirical sciences. We start with a particular case where such techniques are applied to human neuroimaging data to obtain graphs which can then yield insights connecting neurobiology to human task performance. Reproducing such insights across populations requires statistical learning techniques such as averaging and PCA across graphs without known node correspondences. We formulate this problem using the Gromov-Wasserstein (GW) distance and present a recently-developed Riemannian framework for GW-averaging and tangent PCA. Beyond graph adjacency matrices, this framework permits consuming derived network representations such as distance or kernel matrices, and each choice leads to additional structure on the GW problem that can be exploited for theoretical and/or computational advantages. In particular, we show how replacing the adjacency matrix representation with a spectral representation leads to theoretical guarantees allowing efficient use of the Riemannian framework. Additionally we present numerics showing how the spectral representation achieves state of the art accuracy and runtime in graph learning tasks such as matching and partitioning on a variety of real and simulated datasets.