Abstract: Group synchronization, roughly speaking, asks to recover group elements on graph nodes from their relative measurements on graph edges. It is a critical problem in computer science, statistics and data science, and it has important applications to 3-D reconstruction, graph matching, phase retrieval, ranking and community detection. However, it faces a variety of challenges such as high corruption of edge measurements and computational bottlenecks. In our recent works, we proposed two frameworks, both based on cycle-consistency, to efficiently infer corruption levels of edge measurements. We present several exact recovery guarantees of our frameworks under adversarial and uniform corruption models. Our frameworks of corruption estimation enables various postprocessing methods for robustly solving group elements, and we demonstrate the state-of-the-art performance of our methods in terms of both speed and accuracy.
Abstract: The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. (2021). We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.
Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates -- or their rank-k approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms.
Abstract: Convex regression is the problem of fitting a convex function to a collection of input-output pairs, and arises naturally in applications such as economics, engineering and computed tomography. We present a new approach to this problem called spectrahedral regression, in which we fit a spectrahedral function to the data, i.e. a function that is the maximum eigenvalue of an affine matrix expression of the input. These models can exhibit both smooth and singular features, unlike existing methods. We first provide bounds on how well spectrahedral functions can approximate arbitrary convex functions via statistical risk analysis. Second, we analyze an alternating minimization algorithm for the non-convex optimization problem of fitting a spectrahedral function to data. Finally, we demonstrate the utility of our approach with experiments on synthetic and real data sets. This talk is based on joint work with Venkat Chandrasekaran.
Abstract: We consider data structured as attributed simplicial complexes, which are generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows for greater flexibility in modeling higher-order relationships. With this generalized framework, however, comes more complicated notions of message passing or convolution on the simplices to define simplicial neural network layers. While it is desirable in certain applications to use a more general input space of simplicial complexes, as opposed to graphs, such additional structure imposes high computational cost. To mitigate the increased computational cost as well as to reduce overfitting in the models, we propose a learned coarsening of simplicial complexes to be interleaved between standard simplicial neural network layers. This simplicial pooling layer generates hierarchical representations of simplicial complexes and is a generalized graph coarsening method which collapses information in a learned fashion. In this framework, a soft partition of vertices is learned at each layer and used to coarsen the simplicial complex for input to the subsequent layer. The pooling method extends learned vertex clustering to coarsen the higher dimensional simplices in a deterministic fashion. While in practice, the pooling operations are computed via a series of matrix operations, its topological motivation is based on unions of stars of simplices.
Abstract: Intrinsic image dataset manifolds and their relationship to neural network generalization have recently begun to be studied for the common domain of natural images, but little such research has been attempted for radiological images, which is important due to the difference in relevant visual features for downstream tasks. Here we (1) compare the intrinsic manifold dimensionality (ID) of radiological and natural images and (2) investigate the relationship between ID and network generalization ability (GA). We find that radiological images generally have lower ID than natural images, yet a steeper correlation between GA and ID. However, to our knowledge, a formal model that explains this "domain shift" does not exist, which could be enabled by more rigorous mathematical analysis, leading the way towards more principled development of methods specifically tailored for medical image analysis.
Abstract: Groups and symmetries are at the heart of many problems in statistics and data analysis. I will focus on parameter estimation in statistical models via maximum likelihood estimation. We will see connections between maximum likelihood estimation, linear algebra, and invariant theory. The group or symmetric structure of a statistical model can be used to capture the existence and uniqueness of a maximum likelihood estimate, as well as to suggest suitable algorithms to find it. This talk is based on joint work with Carlos Améndola, Kathlén Kohn, Visu Makam, and Philipp Reichenbach.
Abstract: Persistent homology has emerged as a powerful tool for understanding complex data. However, interpretation and proper use of persistent shape statistics often hinges on a (theoretically) intractable inverse problem: selection of "good" cycles to represent features in homology. In this talk we will recall the basic notions of persistent homology and some of its recent scientific applications. We'll discuss what makes a cycle representatives better or worse in this context, and show experimental evidence that "guessing" reliably produces good representatives. We'll then explore some beautiful and interesting properties of optimal cycles observed in scientific data.
Abstract: I will describe joint work with Jose Perea in which we propose combinatorial notions of vector bundle, study the extent to which these determine true vector bundles, and give algorithms for the stable and consistent computation of low-dimensional characteristic classes directly from these representations. I will also show how this framework can be used to frame a dimensionality reduction algorithm which preserves the global topology as well as the local linear structure of data.
Abstract: Machine learning models have obtained impressive performance on benchmark many-to-one datasets such as CIFAR-10. However, there are many real-world datasets that can be described as many-to-many. That is, a single input can yield different outputs and many different inputs can yield the same output. For example, in an image-to-text task, a single image has many possible captions and conversely many different images can have the same caption. Building off of previous work that used fiber bundles to model many-to-one processes, we present a model for many-to-many processes and discuss some of the challenges of working with real-world data that motivated this work.
Abstract: Although machine learning researchers have introduced a plethora of useful constructions for learning over Euclidean space, numerous types of data in various applications benefit from, if not necessitate, a non-Euclidean treatment. In this talk I cover the need for Riemannian geometric constructs to (1) build more principled generalizations of common Euclidean operations used in geometric machine learning models as well as to (2) enable general manifold density learning in contexts that require it. Said contexts include theoretical physics, robotics, and computational biology. I will cover one of my papers that fits into (1) above, namely the ICML 2020 paper "Differentiating through the Fréchet Mean." I will also cover two of my papers that fit into (2) above, namely the NeurIPS 2020 paper "Neural Manifold ODEs" and the NeurIPS 2021 paper "Equivariant Manifold Flows." Finally, I will briefly discuss directions of relevant ongoing work.
Abstract: A central challenge to building a theoretical foundation for deep learning is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network’s architecture and a continuous-time description of the discrete numerical trajectory taken by SGD at finite learning rates. We will discuss our recent work “Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics” and two follow-up directions of research where we establish new formulations of deep learning dynamics based on Lagrangian and statistical Mechanics.
Abstract: Machine learning models today attain impressive accuracy on many benchmark tasks. But to what extent do these models generalize to the real world tasks we ultimately care about?
In this talk, we will examine the performance of current ML models from this perspective. We will start with what is perhaps one of the most striking failure modes of these models---their vulnerability to imperceptible input perturbations known as adversarial examples. We take a closer look at this brittleness, and examine how it can, in part, be attributed to the fact that our models often make decisions very differently to humans. Then, we take a step back and revisit the building blocks of the ML pipeline to identify other manifestations of such human-ML misalignments and discuss how we can make progress towards mitigating them.
Abstract: To specify a function determined by a feedforward ReLU neural network, one usually gives a list of parameters (weights and biases). However, multiple different choices of parameters can determine the same function; in other words, the map that assigns functions to parameters is not injective. Furthermore, the degree to which injectivity fails is very inhomogeneous across the space of parameters. We define the "functional dimension" of a point in parameter space -- a measure of the dimension of the space of functions that can be realized by perturbing the parameter. I will discuss functional dimension and some of its properties. This talk is based on ongoing joint work with Eli Grigsby, Rob Meyerhoff and Chenxi Wu.
Abstract: Group actions have implicitly played a role in applications since at least the work of Fourier on solutions of the heat equation. Namely, one can generate the classical Fourier basis as an action of the group T on L^2([0,1]). Hundreds of years later in the 20th century, the two most important transforms in applied harmonic analysis arose, the wavelet transform and the short-time Fourier transform, which were created using projective unitary representations of the affine group and (a group related to the) the Weyl-Heisenberg group, respectively. Even in the modern era of data-driven approaches, group symmetries still play an important role, from generating transformations like diffusion wavelets and the scattering transform to characterizing overtrained networks in neural collapse. The talk will start as a colloquium-style talk on applications of group symmetry and finish with some results on the relationship between group symmetry and optimality.
Abstract: The deep learning (DL) model decision process is famously opaque. However, DL models are not “black boxes:” with modern DL interpretability tools, we can obtain useful explanations for DL models. In this talk, we highlight a high level interpretability approach: intermediate layer probing. We motivate the approach with a couple of case studies from the literature, focusing on concept activation vectors.
Abstract: As the world adopts Artificial Intelligence, the privacy risks are many. AI can improve our lives, but may leak our private data. Private AI is based on Homomorphic Encryption (HE), a new encryption paradigm which allows the cloud to operate on private data in encrypted form, without ever decrypting it, enabling private training and private prediction. Our 2016 ICML CryptoNets paper showed for the first time that it was possible to evaluate neural nets on homomorphically encrypted data, and opened new research directions combining machine learning and cryptography. The security of Homomorphic Encryption is based on hard problems in mathematics involving lattices, a candidate for post-quantum cryptography. This talk will explain Homomorphic Encryption, Private AI, and explain HE in action.
November 18th: Mitchell Wortsman (UW, Computer Science and Engineering)
Title: Learning Neural Network Subspaces
Abstract: Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.
Friday, November 12th: Rekha Thomas (UW, Math)
Title: Chirality in Vision
Abstract: In computer vision, chirality refers to the constraint that for a scene to be visible in a camera, it must lie in front of the camera. Ignoring this important physical constraint there is now a mature algebraic theory of image formation in pinhole cameras. In this talk I will discuss new structural results that arise from respecting chirality. Using real and semialgebraic geometry one can explicitly describe the set of all true images in an arrangement of cameras. The converse question is when given tuples of points are indeed the images of world points in front of cameras. I will give a complete answer in the case of two cameras, uncovering an unexpected connection to the classical theory of cubic surfaces with 27 real lines.
Joint work with Sameer Agarwal, Andrew Pryhuber and Rainer Sinn.
Abstract: Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this talk, we compare the variability of topic lengths discovered by several well-known topic modeling methods including latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), as well as its tensor counterparts based on the nonnegative CANDECOMP/PARAFAC tensor decomposition (NCPD and Online NCPD). We demonstrate that only tensor-based methods with the dedicated mode for tracking time evolution successfully detect short-lasting topics. Furthermore, these methods are considerably more accurate in discovering the points in time when topics appeared and disappeared compared to the matrix-based methods such as LDA and NMF. We propose quantitative ways to measure the topic length and demonstrate the ability of NCPD (as well as its online variant), to discover short and long-lasting temporal topics in semi-synthetic and real-world data including news headlines and COVID-19 related tweets.
Abstract: Many modern neural networks rely solely on ReLU activation functions for nonlinearity, motivated by factors including both the computational simplicity of ReLUs and their role in helping prevent vanishing gradients. These networks are consequently piecewise linear functions and this opens the door to studying their behavior using combinatorial and geometric techniques without direct analog for networks employing smooth nonlinearities such as sigmoid activations. In this talk, we will discuss metrics that take advantage of this linear geometry and can be used to classify in and out-of-distribution inputs for ReLU networks.
The famous Dowker complex is the applied combinatorial topologist's go-to tool for studying relational data, but on its own it's rather limited. This talk will outline some richer structure built upon the Dowker complex and its dual category-theoretic and topological nature. The talk will also present a newly-discovered way to estimate distances between relations. These properties can be interpreted back into the context of the original application, giving new ways to identify what it means for files to comply with (or violate) a particular format.
Abstract: Exploiting symmetries in the learning problem has been a guiding principle for designing efficient neural network architectures on new domains, like images, sets, graphs, and point clouds. While these symmetries such as translation, rotation, and permutation require only a few bits of prior knowledge to specify, restricting to equivariant models vastly reduces the search space and improves generalization. Using tools from representation theory, we provide a general algorithm for computing the space of equivariant layers for matrix groups. In addition to recovering well known solutions from prior work, we can construct equivariant networks to new groups and representations. Building on these results, we outline a path towards automating model construction more generally.
Abstract: There is currently an unprecedented demand for efficient, quantitative, and interpretable methods to study large-scale (often multi-modal) data. One key area of interest is that of topic modeling, which seeks to automatically learn latent trends or topics of complex data sets, providing practitioners a view of what is “going on” inside their data. This talk will survey several new tools for topic modeling on matrix and tensor data which allow for use of various forms of supervision and which learn hierarchical structure amongst topics. These tools are of interest across the many fields and industries producing, capturing, and analyzing big data, but are of particular interest in applications where expert supervision is available and often essential (e.g., medicine). We will describe two applications of these methods to medical data; an application to a large-scale patient survey database and an ongoing application to cardiovascular imaging data.