The Pacific Northwest Seminar on Topology, Algebra, and Geometry in Data Science

A virtual seminar hosted out of the University of Washington Math Department

In the Pacific Northwest Seminar on Topology, Algebra, and Geometry in Data Science (TAG-DS) we want to connect mathematicians with critical problems in data science. We host talks that satisfy one of the following three goals:

If you would like to be included on our mailing list, please enter your email in the form:

Or visit the mailing list information page to add yourself directly.

Links for talks held over Zoom are provided through the mailing list. If you need a link at the last minute, please email one of the organizers (listed below).

Spring 2024 Speakers:

Abstract: Interpretability aims to reverse-engineer the mechanisms implemented by neural networks. For the most part, current efforts rely on human labor that may not scale to models with billions of parameters; thus it is necessary to automate interpretability methods. In this talk, I propose to use "meta-models", that is neural networks that take another network's parameters as input, as a candidate method for automated interpretability. I'll present results on using meta-models to 1) recover mechanisms from synthetic model weights (reversing the Tracr compiler), and 2) detecting backdoors / Trojans from weights.

Abstract: During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even non-stochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We treat neural network training as a system of stiff ordinary differential equations and use an exponential Euler solver to train the network without entering the edge of stability. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the preactivations at the front of the network can cause a large change in the outputs at the back of the network. We further demonstrate that the degree of layerwise Jacobian alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.97.

Abstract: TBA

Abstract: TBA


Past Talks:

Winter 2024 Speakers:

Abstract: In this talk, I will describe recent work in the application of machine learning to explore questions in algebraic geometry, specifically in the context of the study of Q-Fano varieties. Q-Fano varieties are Fano varieties with mild singularities (called terminal singularities) and they are the key players in the Minimal Model Program. We ask if machine learning can determine if a toric Fano variety has terminal singularities, since there is no global efficient algorithm that does this (except for the weighted projective spaces case). A simple feedforward neural network is able to detect terminal singularities with 95% accuracy for varieties in dimension eight and Picard rank two. Building this high-accuracy model has two consequences. Firstly, it inspires the formulation of a new global, combinatorial criterion to determine if a toric variety of Picard rank two has terminal singularities. Secondly, the machine learning model is used directly to give the first sketch of the landscape of Q-Fano varieties in dimension eight. This is joint work with Tom Coates and Al Kasprzyk.

Abstract: Neural network representations contain structure beyond what was present in the training labels. For instance, representations of images that are visually or semantically similar tend to lie closer to each other than to dissimilar images, regardless of their labels. Clustering these representations can thus provide insights into dataset properties as well as the network internals. Our recent work studies how the many design choices involved in neural network training affect the clusters formed in the hidden representations. To do so, we establish an evaluation setup based on a modified ImageNet hierarchy, for the task of subclass clustering after training models with only superclass information. Our findings isolate the training dataset and architecture as important factors affecting clusterability. Datasets with labeled classes consisting of unrelated subclasses yield much better clusterability than those following a natural hierarchy. In addition, when using pretrained models to cluster representations on downstream datasets, models pretrained on subclass labels provide better clusterability than models pretrained on superclass labels, but only when there is a high degree of domain overlap between the pretraining and downstream data. Architecturally, we find that normalization strategies affect which layers yield the best clustering performance, and, surprisingly, Vision Transformers attain lower subclass clusterability than ResNets.

Abstract: This week's seminar discussion will focus on some intriguing phenomena associated with the weight interpolations of fine-tuned foundation models. Many impressive recent results in the deep learning literature exploit these phenomena—in the past year, work has leveraged weight interpolations to achieve state-of-the-art results in computer vision for the ImageNet dataset, for out-of-distribution robustness, and for adversarial robustness. Despite this success, our understanding of these weight interpolations is limited and not well-grounded in theory. We will informally introduce the topic and provide some key results for the first part of the session. Following this, we will host a group discussion including presentations on interesting and/or related findings in the literature.

Abstract: Estimating the relative orientation of multiple cameras is a classical problem of computer vision, and a key part of modern systems that reconstruct 3D scenes from images. I will briefly survey well-known and commonly-used methods for two perspective cameras that estimate the so-called fundamental or essential matrix. These methods require solving systems of polynomial equations with (very) special structure.

I will then describe recent solutions to two analogous problems involving three or four cameras. Both solutions, previously considered out-of-reach, use nontrivial insights from computational algebraic geometry. 

The first problem (joint work with Hruby, Leykin, and Pajdla) is a relaxation of the notorious “four-point problem” studied by Nister and Schaffalitzky. A general solution to this problem is inherently difficult, as it requires computing the roots of a degree-272 polynomial. To bypass this difficulty, we develop a “pick and solve” scheme combining homotopy continuation and machine learning. This “pick-and-solve” scheme delivers practical performance. In the framework of RANSAC (RAndom Sampling And Consensus), on average we find 1 solution every 60 microseconds.

The second problem (joint work with Hruby, Korotynskiy, Oeding, Pollefeys, Larsson, and Pajdla) concerns “radial cameras” that can be used model a wide variety of optical imaging devices (eg. fisheye lenses.) At first glance, we are faced with computing the roots of a degree-3584 polynomial. However, unlike the previous case, this problem decomposes spectacularly into a sequence of subproblems of algebraic degree 28 or less. This allows us to devise a solution that incorporates well into an existing 3D reconstruction pipeline.

Abstract: Riemannian geometry is a powerful machinery that helps to explore the underlying structure of generative models. Yet, its application is limited by the fact that most generative models are inherently stochastic. A deterministic approximation is needed to overcome this issue. Two approaches are proposed: one involving the average of the metrics, and one involving the average of lengths. This study compare both solutions, which draw from Riemannian and Finslerian geometry, respectively. In high dimensions, we show that both methods are equivalent. 

Abstract: Understanding the fundamental principles behind the massive success of neural networks is one of the most important open questions in deep learning. In this talk, I will share some progress through two recent papers. In the first part, we consider the triplet (data, model, inference algorithm) as an integrated system and discuss their synergies and symmetries. The second part focuses solely on the benefits of the architecture's inductive biases. We explain how the topologies of convolutional networks (locality, hierarchy, model scaling, etc.) can help overcome the curse of dimensionalities in high dimensions.

Abstract: Gradient descent --- one of the simplest optimization algorithms --- has been a workhorse of large-scale machine learning for decades. In particular, the finite-time (i.e., "nonasymptotic") theoretical guarantees on its rates of convergence have been extensively studied on broad families of functions including convex and smooth functions. However, modern machine learning has witnessed the emergence of problems far beyond the aforementioned problem classes. Indeed, tremendous empirical success (such as that of deep learning) has recently been powered by tools like Google's TensorFlow and Facebook's PyTorch that use non-smooth non-convex optimization under the hood. A fundamental theoretical question, therefore, is to obtain nonasymptotic theoretical guarantees for non-smooth non-convex optimization. In this talk, I will present our result for this problem, giving the first non-asymptotic convergence guarantee for non-smooth non-convex optimization under the ``standard first-order oracle model''. This is joint work with Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, and Guanghao Ye and appeared as an Oral Presentation in NeurIPS 2022. 

Abstract:  I will present a perspective from a recent line of work on how "adversarial examples"—input perturbations that dramatically change the neural network's output—can help us understand the (dis)similarities between different neural network classifiers. I will show that existing similarity metrics overestimate neural network similarity and will present an invariance-based method to correct the issues with previous metrics. To conclude, I will describe how we can use the corrected similarity metrics to reveal a surprising (empirical) asymmetrical relationship between non-robust and robust neural networks in which non-robust networks exhibit similarities with robust networks, but not with other non-robust networks.

Abstract: Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit Lie group representations to derive the equivariant kernels and nonlinearities. We present three contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations with a novel algorithm that finds representations of arbitrary Lie groups given only the structure constants of the associated Lie algebra. Second, we provide a self-contained method and software for building Lie group-equivariant neural networks using these representations. Third, we contribute a novel benchmark dataset for classifying objects from relativistic point clouds, and apply our methods to construct the first object-tracking model equivariant to the Poincaré group.

Abstract: This work is dedicated to the topological analysis of complex transitional networks for dynamic state detection. Transitional networks are formed from time series data and are typically studied using tools from graph theory to reveal information about the underlying dynamical system. However, traditional tools can fail to summarize the often complex topology present in such graphs. In this work we leverage persistent homology from Topological Data Analysis (TDA) to study the structure of these networks. Namely, we study the persistent homology of the Ordinal Partition Network (OPN) and Coarse Grained State Space Network (CGSSN) and analyze their dynamic state detection performance.


Abstract: Group synchronization, roughly speaking, asks to recover group elements on graph nodes from their relative measurements on graph edges. It is a critical problem in computer science, statistics and data science, and it has important applications to 3-D reconstruction, graph matching, phase retrieval, ranking and community detection. However, it faces a variety of challenges such as high corruption of edge measurements and computational bottlenecks. In our recent works, we proposed two frameworks, both based on cycle-consistency, to efficiently infer corruption levels of edge measurements. We present several exact recovery guarantees of our frameworks under adversarial and uniform corruption models. Our frameworks of corruption estimation enables various postprocessing methods for robustly solving group elements, and we demonstrate the state-of-the-art performance of our methods in terms of both speed and accuracy.

Abstract: The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. (2021). We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.

Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates -- or their rank-k approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms.

Abstract: Convex regression is the problem of fitting a convex function to a collection of input-output pairs, and arises naturally in applications such as economics, engineering and computed tomography. We present a new approach to this problem called spectrahedral regression, in which we fit a spectrahedral function to the data, i.e. a function that is the maximum eigenvalue of an affine matrix expression of the input. These models can exhibit both smooth and singular features, unlike existing methods. We first provide bounds on how well spectrahedral functions can approximate arbitrary convex functions via statistical risk analysis. Second, we analyze an alternating minimization algorithm for the non-convex optimization problem of fitting a spectrahedral function to data. Finally, we demonstrate the utility of our approach with experiments on synthetic and real data sets. This talk is based on joint work with Venkat Chandrasekaran.

Abstract: We consider data structured as attributed simplicial complexes, which are generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows for greater flexibility in modeling higher-order relationships. With this generalized framework, however, comes more complicated notions of message passing or convolution on the simplices to define simplicial neural network layers. While it is desirable in certain applications to use a more general input space of simplicial complexes, as opposed to graphs, such additional structure imposes high computational cost. To mitigate the increased computational cost as well as to reduce overfitting in the models, we propose a learned coarsening of simplicial complexes to be interleaved between standard simplicial neural network layers. This simplicial pooling layer generates hierarchical representations of simplicial complexes and is a generalized graph coarsening method which collapses information in a learned fashion. In this framework, a soft partition of vertices is learned at each layer and used to coarsen the simplicial complex for input to the subsequent layer. The pooling method extends learned vertex clustering to coarsen the higher dimensional simplices in a deterministic fashion. While in practice, the pooling operations are computed via a series of matrix operations, its topological motivation is based on unions of stars of simplices.

Abstract: Intrinsic image dataset manifolds and their relationship to neural network generalization have recently begun to be studied for the common domain of natural images, but little such research has been attempted for radiological images, which is important due to the difference in relevant visual features for downstream tasks. Here we (1) compare the intrinsic manifold dimensionality (ID) of radiological and natural images and (2) investigate the relationship between ID and network generalization ability (GA). We find that radiological images generally have lower ID than natural images, yet a steeper correlation between GA and ID. However, to our knowledge, a formal model that explains this "domain shift" does not exist, which could be enabled by more rigorous mathematical analysis, leading the way towards more principled development of methods specifically tailored for medical image analysis.

Abstract: Groups and symmetries are at the heart of many problems in statistics and data analysis. I will focus on parameter estimation in statistical models via maximum likelihood estimation. We will see connections between maximum likelihood estimation, linear algebra, and invariant theory. The group or symmetric structure of a statistical model can be used to capture the existence and uniqueness of a maximum likelihood estimate, as well as to suggest suitable algorithms to find it. This talk is based on joint work with Carlos Améndola, Kathlén Kohn, Visu Makam, and Philipp Reichenbach.

Abstract: Persistent homology has emerged as a powerful tool for understanding complex data. However, interpretation and proper use of persistent shape statistics often hinges on a (theoretically) intractable inverse problem: selection of "good" cycles to represent features in homology. In this talk we will recall the basic notions of persistent homology and some of its recent scientific applications. We'll discuss what makes a cycle representatives better or worse in this context, and show experimental evidence that "guessing" reliably produces good representatives. We'll then explore some beautiful and interesting properties of optimal cycles observed in scientific data.

Abstract: I will describe joint work with Jose Perea in which we propose combinatorial notions of vector bundle, study the extent to which these determine true vector bundles, and give algorithms for the stable and consistent computation of low-dimensional characteristic classes directly from these representations. I will also show how this framework can be used to frame a dimensionality reduction algorithm which preserves the global topology as well as the local linear structure of data.

Abstract: Machine learning models have obtained impressive performance on benchmark many-to-one datasets such as CIFAR-10. However, there are many real-world datasets that can be described as many-to-many. That is, a single input can yield different outputs and many different inputs can yield the same output. For example, in an image-to-text task, a single image has many possible captions and conversely many different images can have the same caption. Building off of previous work that used fiber bundles to model many-to-one processes, we present a model for many-to-many processes and discuss some of the challenges of working with real-world data that motivated this work.

Abstract: Although machine learning researchers have introduced a plethora of useful constructions for learning over Euclidean space, numerous types of data in various applications benefit from, if not necessitate, a non-Euclidean treatment. In this talk I cover the need for Riemannian geometric constructs to (1) build more principled generalizations of common Euclidean operations used in geometric machine learning models as well as to (2) enable general manifold density learning in contexts that require it. Said contexts include theoretical physics, robotics, and computational biology. I will cover one of my papers that fits into (1) above, namely the ICML 2020 paper "Differentiating through the Fréchet Mean." I will also cover two of my papers that fit into (2) above, namely the NeurIPS 2020 paper "Neural Manifold ODEs" and the NeurIPS 2021 paper "Equivariant Manifold Flows." Finally, I will briefly discuss directions of relevant ongoing work.

Abstract: A central challenge to building a theoretical foundation for deep learning is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network’s architecture and a continuous-time description of the discrete numerical trajectory taken by SGD at finite learning rates. We will discuss our recent work “Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics” and two follow-up directions of research where we establish new formulations of deep learning dynamics based on Lagrangian and statistical Mechanics.

Abstract: Machine learning models today attain impressive accuracy on many benchmark tasks. But to what extent do these models generalize to the real world tasks we ultimately care about?

In this talk, we will examine the performance of current ML models from this perspective. We will start with what is perhaps one of the most striking failure modes of these models---their vulnerability to imperceptible input perturbations known as adversarial examples. We take a closer look at this brittleness, and examine how it can, in part, be attributed to the fact that our models often make decisions very differently to humans. Then, we take a step back and revisit the building blocks of the ML pipeline to identify other manifestations of such human-ML misalignments and discuss how we can make progress towards mitigating them.

Abstract: To specify a function determined by a feedforward ReLU neural network, one usually gives a list of parameters (weights and biases).  However, multiple different choices of parameters can determine the same function; in other words, the map that assigns functions to parameters is not injective.  Furthermore, the degree to which injectivity fails is very inhomogeneous across the space of parameters.  We define the "functional dimension" of a point in parameter space -- a measure of the dimension of the space of functions that can be realized by perturbing the parameter.  I will discuss functional dimension and some of its properties.  This talk is based on ongoing joint work with Eli Grigsby, Rob Meyerhoff and Chenxi Wu.  

Abstract: Group actions have implicitly played a role in applications since at least the work of Fourier on solutions of the heat equation. Namely, one can generate the classical Fourier basis as an action of the group T on L^2([0,1]). Hundreds of years later in the 20th century, the two most important transforms in applied harmonic analysis arose, the wavelet transform and the short-time Fourier transform, which were created using projective unitary representations of the affine group and (a group related to the) the Weyl-Heisenberg group, respectively. Even in the modern era of data-driven approaches, group symmetries still play an important role, from generating transformations like diffusion wavelets and the scattering transform to characterizing overtrained networks in neural collapse. The talk will start as a colloquium-style talk on applications of group symmetry and finish with some results on the relationship between group symmetry and optimality.

Abstract: The deep learning (DL) model decision process is famously opaque. However, DL models are not “black boxes:” with modern DL interpretability tools, we can obtain useful explanations for DL models. In this talk, we highlight a high level interpretability approach: intermediate layer probing. We motivate the approach with a couple of case studies from the literature, focusing on concept activation vectors.


Abstract: As the world adopts Artificial Intelligence, the privacy risks are many. AI can improve our lives, but may leak our private data. Private AI is based on Homomorphic Encryption (HE), a new encryption paradigm which allows the cloud to operate on private data in encrypted form, without ever decrypting it, enabling private training and private prediction. Our 2016 ICML CryptoNets paper showed for the first time that it was possible to evaluate neural nets on homomorphically encrypted data, and opened new research directions combining machine learning and cryptography. The security of Homomorphic Encryption is based on hard problems in mathematics involving lattices, a candidate for post-quantum cryptography. This talk will explain Homomorphic Encryption, Private AI, and explain HE in action.

Abstract: Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

Abstract: In computer vision, chirality refers to the constraint that for a scene to be visible in a camera, it must lie in front of the camera. Ignoring this important physical constraint there is now a mature algebraic theory of image formation in pinhole cameras. In this talk I will discuss new structural results that arise from respecting chirality. Using real and  semialgebraic geometry one can explicitly describe the set of all true images in an arrangement of cameras. The converse question is when given tuples of points are indeed the images of world points in front of cameras. I will give a complete answer in the case of two cameras, uncovering an unexpected connection to the classical theory of cubic surfaces with 27 real lines.

Joint work with Sameer Agarwal, Andrew Pryhuber and Rainer Sinn.

Abstract: Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this talk, we compare the variability of topic lengths discovered by several well-known topic modeling methods including latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), as well as its tensor counterparts based on the nonnegative CANDECOMP/PARAFAC tensor decomposition (NCPD and Online NCPD). We demonstrate that only tensor-based methods with the dedicated mode for tracking time evolution successfully detect short-lasting topics. Furthermore, these methods are considerably more accurate in discovering the points in time when topics appeared and disappeared compared to the matrix-based methods such as LDA and NMF. We propose quantitative ways to measure the topic length and demonstrate the ability of NCPD (as well as its online variant), to discover short and long-lasting temporal topics in semi-synthetic and real-world data including news headlines and COVID-19 related tweets.

Abstract: Many modern neural networks rely solely on ReLU activation functions for nonlinearity, motivated by factors including both the computational simplicity of ReLUs and their role in helping prevent vanishing gradients. These networks are consequently piecewise linear functions and this opens the door to studying their behavior using combinatorial and geometric techniques without direct analog for networks employing smooth nonlinearities such as sigmoid activations. In this talk, we will discuss metrics that take advantage of this linear geometry and can be used to classify in and out-of-distribution inputs for ReLU networks.

Abstract: Binary relations between two sets occur frequently in applications.  For instance, determining which files can be successfully read by which computer programs is not a function, but a relation.  That is, multiple files can be read by a given program, and conversely a given file may be read by many different programs.  If one considers a set of files that are supposed to comply with a particular format, this imposes considerable structure on the file-program relations that can arise.  This is all the more interesting if the file format is ill-specified or loosely interpreted by the authors of the programs.  (Many of the common file formats in use today, such as HTML, JavaScript, and PDF follow consensus specifications; their official specification is not machine-parsable.) 

The famous Dowker complex is the applied combinatorial topologist's go-to tool for studying relational data, but on its own it's rather limited.  This talk will outline some richer structure built upon the Dowker complex and its dual category-theoretic and topological nature.  The talk will also present a newly-discovered way to estimate distances between relations.  These properties can be interpreted back into the context of the original application, giving new ways to identify what it means for files to comply with (or violate) a particular format. 

Abstract: Exploiting symmetries in the learning problem has been a guiding principle for designing efficient neural network architectures on new domains, like images, sets, graphs, and point clouds. While these symmetries such as translation, rotation, and permutation require only a few bits of prior knowledge to specify, restricting to equivariant models vastly reduces the search space and improves generalization. Using tools from representation theory, we provide a general algorithm for computing the space of equivariant layers for matrix groups. In addition to recovering well known solutions from prior work, we can construct equivariant networks to new groups and representations. Building on these results, we outline a path towards automating model construction more generally.

Abstract: There is currently an unprecedented demand for efficient, quantitative, and interpretable methods to study large-scale (often multi-modal) data. One key area of interest is that of topic modeling, which seeks to automatically learn latent trends or topics of complex data sets, providing practitioners a view of what is “going on” inside their data. This talk will survey several new tools for topic modeling on matrix and tensor data which allow for use of various forms of supervision and which learn hierarchical structure amongst topics.  These tools are of interest across the many fields and industries producing, capturing, and analyzing big data, but are of particular interest in applications where expert supervision is available and often essential (e.g., medicine).  We will describe two applications of these methods to medical data; an application to a large-scale patient survey database and an ongoing application to cardiovascular imaging data.