# The Pacific Northwest Seminar on Topology, Algebra, and Geometry in Data Science

### A virtual seminar hosted out of the University of Washington Math Department

## In the Pacific Northwest Seminar on Topology, Algebra, and Geometry in Data Science (TAG-DS) we want to connect mathematicians with critical problems in data science. We host talks that satisfy one of the following three goals:

A venue to showcase research being done at the intersection of mathematics and data science with a focus on methods from algebra, geometry, topology, and combinatorics.

A modality with which to make mathematicians aware of specific problems in data science to which they might contribute.

A tool to facilitate collaborations between mathematicians (particularly graduate students) and researchers from other domains.

## If you would like to be included on our mailing list, please enter your email in the form:

Links for talks held over Zoom are provided through the mailing list. If you need a link at the last minute, please email one of the organizers (listed below).

## Spring 2024 Speakers:

April 2nd: Spring break

April 9th: Spring break

April 16th: Scott Mahan (UCSD/PNNL)

Title: Point Cloud Classification via Deep Set Linearized Optimal Transport

Abstract: We introduce Deep Set Linearized Optimal Transport, an algorithm designed for the efficient simultaneous embedding of point clouds into an $L^2-$space. This embedding preserves specific low-dimensional structures within the Wasserstein space while constructing a classifier to distinguish between various classes of point clouds. Our approach is motivated by the observation that $L^2-$distances between optimal transport maps for distinct point clouds, originating from a shared fixed reference distribution, provide an approximation of the Wasserstein-2 distance between these point clouds, under certain assumptions. To learn approximations of these transport maps, we employ input convex neural networks (ICNNs) and establish that, under specific conditions, Euclidean distances between samples from these ICNNs closely mirror Wasserstein-2 distances between the true distributions. Additionally, we train a discriminator network that attaches weights to these samples and creates a permutation invariant classifier to differentiate between classes of point clouds. We showcase the advantages of our algorithm over the standard deep set approach through experiments on a flow cytometry dataset with a limited number of labeled point clouds.

April 23rd: Lauro Langosco (Cambridge)

Title: Towards Meta-Models for Automated Interpretability

Abstract: Interpretability aims to reverse-engineer the mechanisms implemented by neural networks. For the most part, current efforts rely on human labor that may not scale to models with billions of parameters; thus it is necessary to automate interpretability methods. In this talk, I propose to use "meta-models", that is neural networks that take another network's parameters as input, as a candidate method for automated interpretability. I'll present results on using meta-models to 1) recover mechanisms from synthetic model weights (reversing the Tracr compiler), and 2) detecting backdoors / Trojans from weights.

April 30th: Mark Lowell (NGA)

Title: Training on the Edge of Stability is Caused by Layerwise Jacobian Alignment

Abstract: During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even non-stochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We treat neural network training as a system of stiff ordinary differential equations and use an exponential Euler solver to train the network without entering the edge of stability. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the preactivations at the front of the network can cause a large change in the outputs at the back of the network. We further demonstrate that the degree of layerwise Jacobian alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.97.

May 7th: No seminar

May 14th: Daniel Paleka (ETH Zurich)

Title: Reverse engineering part of a production language model

Abstract: Little is known about the internal workings of today's state-of-the-art AI systems. We introduce the first attack that extracts precise, nontrivial information from black-box production models using API calls. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given access to the standard suite of options for API calls to OpenAI and Google models. In this talk, we describe the original attack, its theoretical limits, and logprob-free alternatives. We conclude with the mathematical problem of optimizing the attack after OpenAI mitigated our original attack vector.

May 21st: Peter Hase (UNC-CH)

Title: Controlling and Editing Knowledge in Large Language Models

Abstract: The success of large-scale pretraining has equipped LLMs with extensive world knowledge, but developing methods for controlling and editing model knowledge remains a challenging task. In this talk, I discuss recent work on model editing and model generalization that helps us understand how we can steer and control the knowledge (and beliefs) expressed by LLMs. Specifically, I (1) frame model editing in terms of propagating factual information within model belief graphs and review methods in this area, (2) show how model editing can be used for unlearning sensitive information in LLMs, after formalizing an adversarial threat model for LLM unlearning, and (3) argue that easy-to-hard generalization is an important problem for understanding how we can effectively supervise models in domains where it would be useful for them to be knowledgeable (e.g. STEM domains) even when labeling data in these domains is especially difficult. Collectively, these problems exemplify the important challenge of understanding and controlling the knowledge and beliefs of LLMs.

May 28th:

Title: TBA

Abstract: TBA

June 4th:

Title: TBA

Abstract: TBA

## Organizers

Henry Kvinge (UW Math, Pacific Northwest National Lab)

Davis Brown (Pacific Northwest National Lab)

Grayson Jorgenson (Pacific Northwest National Lab)

Tim Doster (Pacific Northwest National Lab)

Tegan Emerson (Pacific Northwest National Lab)

## Past Talks:

## Winter 2024 Speakers:

January 16th: Sara Veneziale (Imperial College London)

Title: Machine learning, toric Fano varieties, and terminal singularities

Abstract: In this talk, I will describe recent work in the application of machine learning to explore questions in algebraic geometry, specifically in the context of the study of Q-Fano varieties. Q-Fano varieties are Fano varieties with mild singularities (called terminal singularities) and they are the key players in the Minimal Model Program. We ask if machine learning can determine if a toric Fano variety has terminal singularities, since there is no global efficient algorithm that does this (except for the weighted projective spaces case). A simple feedforward neural network is able to detect terminal singularities with 95% accuracy for varieties in dimension eight and Picard rank two. Building this high-accuracy model has two consequences. Firstly, it inspires the formulation of a new global, combinatorial criterion to determine if a toric variety of Picard rank two has terminal singularities. Secondly, the machine learning model is used directly to give the first sketch of the landscape of Q-Fano varieties in dimension eight. This is joint work with Tom Coates and Al Kasprzyk.

January 23rd: No seminar

January 30th: Florentin Guth (NYU)

Title: Generalization and universality in deep learning

Abstract: The generalization performance of deep learning is impressive. In unsupervised learning, recent generative models such as diffusion models are able to generate very high-quality images. In supervised learning, deep networks trained to classify images learn features that transfer to a wide variety of tasks. This raises the question: to what extent is what is learned independent of the training samples and even the training data distribution? I will show that diffusion models transition from memorization to generalization as the number of training samples increases, and that networks trained on different image classification datasets learn common features even at deep layers. These results imply that a large part of what deep learning extracts from an image dataset is universal.

February 6th: Derek Lim (MIT)

Title: Graph Metanetworks for Processing Diverse Neural Architectures

Abstract: Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. In this talk, we discuss new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to input neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.

February 13th: Aaron Lou (Stanford)

Title: Title: Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

Abstract: In recent years, modern large language models (LLMs) like ChatGPT have revolutionized the field of machine learning. These systems employ a technique called autoregressive modeling (ie next token prediction), which has been the dominant probabilistic modeling method on discrete data for decades. However, autoregressive modeling has its limitations, necessitating a reexamination of methodology. In this talk, I will discuss our recent work on score entropy discrete diffusion (SEDD) models, an alternative non-autoregressive generative model for discrete data. Theoretically, SEDD estimates the density ratios between neighboring sentences and uses them to parameterize a generative continuous time markov chain. Empirically, SEDD challenges the longstanding dominance of autoregressive modeling: achieving competitive likelihoods (notably beating GPT-2 on zero-shot tasks), generating higher quality text (up to 8x better), allowing one to trade quality for compute (matching quality with 32x fewer network evaluations), and enabling more controllable generation such as prompting from arbitrary positions.

February 27th: Kiho Park (University of Chicago)

Title: The Linear Representation Hypothesis and the Geometry of Large Language Models

Abstract: Informally, the “linear representation hypothesis” is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does “linear representation” actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity and projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of linear representation, one in the output (word) representation space, and one in the input (context) space. We then prove that these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product. This is joint work with Yo Joong Choe and Victor Veitch. Preprint is available at https://arxiv.org/abs/2311.03658.

March 5th: Thao Nguyen (University of Washington)

Title: Probing clustering in neural network representations

Abstract: Neural network representations contain structure beyond what was present in the training labels. For instance, representations of images that are visually or semantically similar tend to lie closer to each other than to dissimilar images, regardless of their labels. Clustering these representations can thus provide insights into dataset properties as well as the network internals. Our recent work studies how the many design choices involved in neural network training affect the clusters formed in the hidden representations. To do so, we establish an evaluation setup based on a modified ImageNet hierarchy, for the task of subclass clustering after training models with only superclass information. Our findings isolate the training dataset and architecture as important factors affecting clusterability. Datasets with labeled classes consisting of unrelated subclasses yield much better clusterability than those following a natural hierarchy. In addition, when using pretrained models to cluster representations on downstream datasets, models pretrained on subclass labels provide better clusterability than models pretrained on superclass labels, but only when there is a high degree of domain overlap between the pretraining and downstream data. Architecturally, we find that normalization strategies affect which layers yield the best clustering performance, and, surprisingly, Vision Transformers attain lower subclass clusterability than ResNets.

March 19th: Sharut Gupta (MIT)

Title: Context is Environment

Abstract: Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the hard lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context--unlabeled examples as they arrive--allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.

March 9th: TAG organizers, group seminar

Title: Discussion on weight interpolation of foundation models

Abstract: This week's seminar discussion will focus on some intriguing phenomena associated with the weight interpolations of fine-tuned foundation models. Many impressive recent results in the deep learning literature exploit these phenomena—in the past year, work has leveraged weight interpolations to achieve state-of-the-art results in computer vision for the ImageNet dataset, for out-of-distribution robustness, and for adversarial robustness. Despite this success, our understanding of these weight interpolations is limited and not well-grounded in theory. We will informally introduce the topic and provide some key results for the first part of the session. Following this, we will host a group discussion including presentations on interesting and/or related findings in the literature.

March 2nd: Tim Duff (University of Washington)

Title: Geometry of two, three, or four cameras

Abstract: Estimating the relative orientation of multiple cameras is a classical problem of computer vision, and a key part of modern systems that reconstruct 3D scenes from images. I will briefly survey well-known and commonly-used methods for two perspective cameras that estimate the so-called fundamental or essential matrix. These methods require solving systems of polynomial equations with (very) special structure.

I will then describe recent solutions to two analogous problems involving three or four cameras. Both solutions, previously considered out-of-reach, use nontrivial insights from computational algebraic geometry.

The first problem (joint work with Hruby, Leykin, and Pajdla) is a relaxation of the notorious “four-point problem” studied by Nister and Schaffalitzky. A general solution to this problem is inherently difficult, as it requires computing the roots of a degree-272 polynomial. To bypass this difficulty, we develop a “pick and solve” scheme combining homotopy continuation and machine learning. This “pick-and-solve” scheme delivers practical performance. In the framework of RANSAC (RAndom Sampling And Consensus), on average we find 1 solution every 60 microseconds.

The second problem (joint work with Hruby, Korotynskiy, Oeding, Pollefeys, Larsson, and Pajdla) concerns “radial cameras” that can be used model a wide variety of optical imaging devices (eg. fisheye lenses.) At first glance, we are faced with computing the roots of a degree-3584 polynomial. However, unlike the previous case, this problem decomposes spectacularly into a sequence of subproblems of algebraic degree 28 or less. This allows us to devise a solution that incorporates well into an existing 3D reconstruction pipeline.

February 23rd: Alison Pouplin (Technical University of Denmark)

Title: Identifying latent distances with Finslerian geometry

Abstract: Riemannian geometry is a powerful machinery that helps to explore the underlying structure of generative models. Yet, its application is limited by the fact that most generative models are inherently stochastic. A deterministic approximation is needed to overcome this issue. Two approaches are proposed: one involving the average of the metrics, and one involving the average of lengths. This study compare both solutions, which draw from Riemannian and Finslerian geometry, respectively. In high dimensions, we show that both methods are equivalent.

February 16th: Lechao Xiao (Google Brain)

Title: Towards a deeper theoretical understanding of neural networks.

Abstract: Understanding the fundamental principles behind the massive success of neural networks is one of the most important open questions in deep learning. In this talk, I will share some progress through two recent papers. In the first part, we consider the triplet (data, model, inference algorithm) as an integrated system and discuss their synergies and symmetries. The second part focuses solely on the benefits of the architecture's inductive biases. We explain how the topologies of convolutional networks (locality, hierarchy, model scaling, etc.) can help overcome the curse of dimensionalities in high dimensions.

February 9th: Swati Padmanabhan (University of Washington)

Title: A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions

Abstract: Gradient descent --- one of the simplest optimization algorithms --- has been a workhorse of large-scale machine learning for decades. In particular, the finite-time (i.e., "nonasymptotic") theoretical guarantees on its rates of convergence have been extensively studied on broad families of functions including convex and smooth functions. However, modern machine learning has witnessed the emergence of problems far beyond the aforementioned problem classes. Indeed, tremendous empirical success (such as that of deep learning) has recently been powered by tools like Google's TensorFlow and Facebook's PyTorch that use non-smooth non-convex optimization under the hood. A fundamental theoretical question, therefore, is to obtain nonasymptotic theoretical guarantees for non-smooth non-convex optimization. In this talk, I will present our result for this problem, giving the first non-asymptotic convergence guarantee for non-smooth non-convex optimization under the ``standard first-order oracle model''. This is joint work with Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, and Guanghao Ye and appeared as an Oral Presentation in NeurIPS 2022.

February 2nd: Jacob Springer (Carnegie Mellon University)

Title: What can adversarial examples tell us about similarities between neural networks?

Abstract: I will present a perspective from a recent line of work on how "adversarial examples"—input perturbations that dramatically change the neural network's output—can help us understand the (dis)similarities between different neural network classifiers. I will show that existing similarity metrics overestimate neural network similarity and will present an invariance-based method to correct the issues with previous metrics. To conclude, I will describe how we can use the corrected similarity metrics to reveal a surprising (empirical) asymmetrical relationship between non-robust and robust neural networks in which non-robust networks exhibit similarities with robust networks, but not with other non-robust networks.

January 26th: Noah Shutty (Google)

Title: Computing Representations for Lie Algebraic Networks

Abstract: Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit Lie group representations to derive the equivariant kernels and nonlinearities. We present three contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations with a novel algorithm that finds representations of arbitrary Lie groups given only the structure constants of the associated Lie algebra. Second, we provide a self-contained method and software for building Lie group-equivariant neural networks using these representations. Third, we contribute a novel benchmark dataset for classifying objects from relativistic point clouds, and apply our methods to construct the first object-tracking model equivariant to the Poincaré group.

January 19th: Audun Myers (PNNL)

Title: Persistent Homology of Complex Networks for Dynamic State Detection

Abstract: This work is dedicated to the topological analysis of complex transitional networks for dynamic state detection. Transitional networks are formed from time series data and are typically studied using tools from graph theory to reveal information about the underlying dynamical system. However, traditional tools can fail to summarize the often complex topology present in such graphs. In this work we leverage persistent homology from Topological Data Analysis (TDA) to study the structure of these networks. Namely, we study the persistent homology of the Ordinal Partition Network (OPN) and Coarse Grained State Space Network (CGSSN) and analyze their dynamic state detection performance.

### 2022

November 10th: Yunpeng Shi (Princeton)

Title: Fast and Robust Methods for Solving Group Synchronization

Abstract: Group synchronization, roughly speaking, asks to recover group elements on graph nodes from their relative measurements on graph edges. It is a critical problem in computer science, statistics and data science, and it has important applications to 3-D reconstruction, graph matching, phase retrieval, ranking and community detection. However, it faces a variety of challenges such as high corruption of edge measurements and computational bottlenecks. In our recent works, we proposed two frameworks, both based on cycle-consistency, to efficiently infer corruption levels of edge measurements. We present several exact recovery guarantees of our frameworks under adversarial and uniform corruption models. Our frameworks of corruption estimation enables various postprocessing methods for robustly solving group elements, and we demonstrate the state-of-the-art performance of our methods in terms of both speed and accuracy.

November 3rd: Samuel Ainsworth (University of Washington)

Title: Git Re-Basin: Merging Models modulo Permutation Symmetries

Abstract: The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. (2021). We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.

October 27th: Bobak Kiani (MIT)

Title: projUNN: efficient method for training deep networks with unitary matrices

Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates -- or their rank-k approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms.

October 20th: Eliza O'Reilly (Caltech)

Title: Spectrahedral Regression

Abstract: Convex regression is the problem of fitting a convex function to a collection of input-output pairs, and arises naturally in applications such as economics, engineering and computed tomography. We present a new approach to this problem called spectrahedral regression, in which we fit a spectrahedral function to the data, i.e. a function that is the maximum eigenvalue of an affine matrix expression of the input. These models can exhibit both smooth and singular features, unlike existing methods. We first provide bounds on how well spectrahedral functions can approximate arbitrary convex functions via statistical risk analysis. Second, we analyze an alternating minimization algorithm for the non-convex optimization problem of fitting a spectrahedral function to data. Finally, we demonstrate the utility of our approach with experiments on synthetic and real data sets. This talk is based on joint work with Venkat Chandrasekaran.

October 13th: Sarah McGuire (Michigan State University)

Title: Neural Networks with Learned Coarsening for Simplicial Complexes

Abstract: We consider data structured as attributed simplicial complexes, which are generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows for greater flexibility in modeling higher-order relationships. With this generalized framework, however, comes more complicated notions of message passing or convolution on the simplices to define simplicial neural network layers. While it is desirable in certain applications to use a more general input space of simplicial complexes, as opposed to graphs, such additional structure imposes high computational cost. To mitigate the increased computational cost as well as to reduce overfitting in the models, we propose a learned coarsening of simplicial complexes to be interleaved between standard simplicial neural network layers. This simplicial pooling layer generates hierarchical representations of simplicial complexes and is a generalized graph coarsening method which collapses information in a learned fashion. In this framework, a soft partition of vertices is learned at each layer and used to coarsen the simplicial complex for input to the subsequent layer. The pooling method extends learned vertex clustering to coarsen the higher dimensional simplices in a deterministic fashion. While in practice, the pooling operations are computed via a series of matrix operations, its topological motivation is based on unions of stars of simplices.

October 6th: Nick Konz (Duke)

Title: The Intrinsic Manifolds of Radiological Images and Their Role in Deep Learning

Abstract: Intrinsic image dataset manifolds and their relationship to neural network generalization have recently begun to be studied for the common domain of natural images, but little such research has been attempted for radiological images, which is important due to the difference in relevant visual features for downstream tasks. Here we (1) compare the intrinsic manifold dimensionality (ID) of radiological and natural images and (2) investigate the relationship between ID and network generalization ability (GA). We find that radiological images generally have lower ID than natural images, yet a steeper correlation between GA and ID. However, to our knowledge, a formal model that explains this "domain shift" does not exist, which could be enabled by more rigorous mathematical analysis, leading the way towards more principled development of methods specifically tailored for medical image analysis.

May 26th: Anna Seigal (Harvard)

Title: Groups and Symmetries in Statistical Models

Abstract: Groups and symmetries are at the heart of many problems in statistics and data analysis. I will focus on parameter estimation in statistical models via maximum likelihood estimation. We will see connections between maximum likelihood estimation, linear algebra, and invariant theory. The group or symmetric structure of a statistical model can be used to capture the existence and uniqueness of a maximum likelihood estimate, as well as to suggest suitable algorithms to find it. This talk is based on joint work with Carlos Améndola, Kathlén Kohn, Visu Makam, and Philipp Reichenbach.

May 19th: Gregory Henselman-Petrusek (PNNL)

Title: How to choose informative representations of topological features in persistent homology

Abstract: Persistent homology has emerged as a powerful tool for understanding complex data. However, interpretation and proper use of persistent shape statistics often hinges on a (theoretically) intractable inverse problem: selection of "good" cycles to represent features in homology. In this talk we will recall the basic notions of persistent homology and some of its recent scientific applications. We'll discuss what makes a cycle representatives better or worse in this context, and show experimental evidence that "guessing" reliably produces good representatives. We'll then explore some beautiful and interesting properties of optimal cycles observed in scientific data.

May 5th: Luis Scoccola (Northeastern)

Title: Approximate vector bundles and fiberwise dimensionality reduction

Abstract: I will describe joint work with Jose Perea in which we propose combinatorial notions of vector bundle, study the extent to which these determine true vector bundles, and give algorithms for the stable and consistent computation of low-dimensional characteristic classes directly from these representations. I will also show how this framework can be used to frame a dimensionality reduction algorithm which preserves the global topology as well as the local linear structure of data.

April 28th: Lizzy Coda (PNNL)

Title: Modeling Many-to-Many Maps

Abstract: Machine learning models have obtained impressive performance on benchmark many-to-one datasets such as CIFAR-10. However, there are many real-world datasets that can be described as many-to-many. That is, a single input can yield different outputs and many different inputs can yield the same output. For example, in an image-to-text task, a single image has many possible captions and conversely many different images can have the same caption. Building off of previous work that used fiber bundles to model many-to-one processes, we present a model for many-to-many processes and discuss some of the challenges of working with real-world data that motivated this work.

April 21th: Isay Katsman (Cornell)

Title: Riemannian Geometry in Machine Learning

Abstract: Although machine learning researchers have introduced a plethora of useful constructions for learning over Euclidean space, numerous types of data in various applications benefit from, if not necessitate, a non-Euclidean treatment. In this talk I cover the need for Riemannian geometric constructs to (1) build more principled generalizations of common Euclidean operations used in geometric machine learning models as well as to (2) enable general manifold density learning in contexts that require it. Said contexts include theoretical physics, robotics, and computational biology. I will cover one of my papers that fits into (1) above, namely the ICML 2020 paper "Differentiating through the Fréchet Mean." I will also cover two of my papers that fit into (2) above, namely the NeurIPS 2020 paper "Neural Manifold ODEs" and the NeurIPS 2021 paper "Equivariant Manifold Flows." Finally, I will briefly discuss directions of relevant ongoing work.

April 14th: Daniel Kunin (Stanford) and Hidenori Tanaka (NTT Physics & Informatics Laboratories)

Title: Towards Mechanics of Learning in Neural Networks

Abstract: A central challenge to building a theoretical foundation for deep learning is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network’s architecture and a continuous-time description of the discrete numerical trajectory taken by SGD at finite learning rates. We will discuss our recent work “Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics” and two follow-up directions of research where we establish new formulations of deep learning dynamics based on Lagrangian and statistical Mechanics.

April 7th: Shibani Santurkar (Stanford)

Title: Machine Learning Beyond Accuracy

Abstract: Machine learning models today attain impressive accuracy on many benchmark tasks. But to what extent do these models generalize to the real world tasks we ultimately care about?

In this talk, we will examine the performance of current ML models from this perspective. We will start with what is perhaps one of the most striking failure modes of these models---their vulnerability to imperceptible input perturbations known as adversarial examples. We take a closer look at this brittleness, and examine how it can, in part, be attributed to the fact that our models often make decisions very differently to humans. Then, we take a step back and revisit the building blocks of the ML pipeline to identify other manifestations of such human-ML misalignments and discuss how we can make progress towards mitigating them.

February 10th: Kathryn Lindsey, (Boston College)

Title: Functional dimension of feedforward ReLU neural networks

Abstract: To specify a function determined by a feedforward ReLU neural network, one usually gives a list of parameters (weights and biases). However, multiple different choices of parameters can determine the same function; in other words, the map that assigns functions to parameters is not injective. Furthermore, the degree to which injectivity fails is very inhomogeneous across the space of parameters. We define the "functional dimension" of a point in parameter space -- a measure of the dimension of the space of functions that can be realized by perturbing the parameter. I will discuss functional dimension and some of its properties. This talk is based on ongoing joint work with Eli Grigsby, Rob Meyerhoff and Chenxi Wu.

February 3rd: Emily King, (Colorado State University)

Title: Applications of Group Symmetry

Abstract: Group actions have implicitly played a role in applications since at least the work of Fourier on solutions of the heat equation. Namely, one can generate the classical Fourier basis as an action of the group T on L^2([0,1]). Hundreds of years later in the 20th century, the two most important transforms in applied harmonic analysis arose, the wavelet transform and the short-time Fourier transform, which were created using projective unitary representations of the affine group and (a group related to the) the Weyl-Heisenberg group, respectively. Even in the modern era of data-driven approaches, group symmetries still play an important role, from generating transformations like diffusion wavelets and the scattering transform to characterizing overtrained networks in neural collapse. The talk will start as a colloquium-style talk on applications of group symmetry and finish with some results on the relationship between group symmetry and optimality.

January 27th: Scott Mahan (University of California, San Diego)

Title: DNA: Dynamic Network Augmentation

Abstract: For image classification tasks, machine learning models perform best when balancing meaningful feature representation with robustness to non-meaningful transformations. When using neural networks for image classification, these geometric or algebraic invariances can be hard-coded or learned from augmented training data. However, data augmentation strategies are usually limited to one fixed policy. In this talk, we motivate the idea of input-dependent data augmentation, where a separate neural network is trained to transform training images conditional on their features. We also evaluate the model on several classification datasets.

January 20th: Davis Brown (Pacific Northwest National Laboratory)

Title: A tool for deep learning model interpretability

Abstract: The deep learning (DL) model decision process is famously opaque. However, DL models are not “black boxes:” with modern DL interpretability tools, we can obtain useful explanations for DL models. In this talk, we highlight a high level interpretability approach: intermediate layer probing. We motivate the approach with a couple of case studies from the literature, focusing on concept activation vectors.

January 13th: Sarah Tymochko (Michigan State University)

Title: From Zigzags to Networks: Topological Tools for Time Series Analysis

Abstract: Topological data analysis (TDA) is a fairly new field with tools to quantify the shape of data in a manner that is concise and robust using concepts from algebraic topology. Persistent homology, one of the most popular tools in TDA, has proven useful in applications to time series data, detecting shape that changes over time and quantifying features like periodicity. In this talk, I will present two applications using tools from TDA to study time series data: the first using zigzag persistence, a generalization of persistent homology, to study bifurcations in dynamical systems and the second, using the shape of weighted, directed networks to distinguish periodic and chaotic behavior in time series data.

### 2021

December 2nd: Kristin Lauter (Facebook AI Research)

Title: Private AI: Machine Learning on Encrypted Data

Abstract: As the world adopts Artificial Intelligence, the privacy risks are many. AI can improve our lives, but may leak our private data. Private AI is based on Homomorphic Encryption (HE), a new encryption paradigm which allows the cloud to operate on private data in encrypted form, without ever decrypting it, enabling private training and private prediction. Our 2016 ICML CryptoNets paper showed for the first time that it was possible to evaluate neural nets on homomorphically encrypted data, and opened new research directions combining machine learning and cryptography. The security of Homomorphic Encryption is based on hard problems in mathematics involving lattices, a candidate for post-quantum cryptography. This talk will explain Homomorphic Encryption, Private AI, and explain HE in action.

November 18th: Mitchell Wortsman (UW, Computer Science and Engineering)

Title: Learning Neural Network Subspaces

Abstract: Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

Friday, November 12th: Rekha Thomas (UW, Math)

Title: Chirality in Vision

Abstract: In computer vision, chirality refers to the constraint that for a scene to be visible in a camera, it must lie in front of the camera. Ignoring this important physical constraint there is now a mature algebraic theory of image formation in pinhole cameras. In this talk I will discuss new structural results that arise from respecting chirality. Using real and semialgebraic geometry one can explicitly describe the set of all true images in an arrangement of cameras. The converse question is when given tuples of points are indeed the images of world points in front of cameras. I will give a complete answer in the case of two cameras, uncovering an unexpected connection to the classical theory of cubic surfaces with 27 real lines.

Joint work with Sameer Agarwal, Andrew Pryhuber and Rainer Sinn.

November 4th: Lara Kassab (Colorado State University)

Title: Detecting Short-lasting Topics Using Nonnegative Tensor Decomposition

Abstract: Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this talk, we compare the variability of topic lengths discovered by several well-known topic modeling methods including latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), as well as its tensor counterparts based on the nonnegative CANDECOMP/PARAFAC tensor decomposition (NCPD and Online NCPD). We demonstrate that only tensor-based methods with the dedicated mode for tracking time evolution successfully detect short-lasting topics. Furthermore, these methods are considerably more accurate in discovering the points in time when topics appeared and disappeared compared to the matrix-based methods such as LDA and NMF. We propose quantitative ways to measure the topic length and demonstrate the ability of NCPD (as well as its online variant), to discover short and long-lasting temporal topics in semi-synthetic and real-world data including news headlines and COVID-19 related tweets.

October 28th: Grayson Jorgenson (PNNL)

Title: Using the linear geometry of ReLU neural networks to detect out-of-distribution inputs

Abstract: Many modern neural networks rely solely on ReLU activation functions for nonlinearity, motivated by factors including both the computational simplicity of ReLUs and their role in helping prevent vanishing gradients. These networks are consequently piecewise linear functions and this opens the door to studying their behavior using combinatorial and geometric techniques without direct analog for networks employing smooth nonlinearities such as sigmoid activations. In this talk, we will discuss metrics that take advantage of this linear geometry and can be used to classify in and out-of-distribution inputs for ReLU networks.

October 21th: Michael Robinson (American University)

Title: Studying relations geometrically and topologically

Abstract: Binary relations between two sets occur frequently in applications. For instance, determining which files can be successfully read by which computer programs is not a function, but a relation. That is, multiple files can be read by a given program, and conversely a given file may be read by many different programs. If one considers a set of files that are supposed to comply with a particular format, this imposes considerable structure on the file-program relations that can arise. This is all the more interesting if the file format is ill-specified or loosely interpreted by the authors of the programs. (Many of the common file formats in use today, such as HTML, JavaScript, and PDF follow consensus specifications; their official specification is not machine-parsable.)

The famous Dowker complex is the applied combinatorial topologist's go-to tool for studying relational data, but on its own it's rather limited. This talk will outline some richer structure built upon the Dowker complex and its dual category-theoretic and topological nature. The talk will also present a newly-discovered way to estimate distances between relations. These properties can be interpreted back into the context of the original application, giving new ways to identify what it means for files to comply with (or violate) a particular format.

October 14th: Marc Finzi (NYU)

Title: A Polynomial Time Algorithm for Constructing Equivariant Neural Networks

Abstract: Exploiting symmetries in the learning problem has been a guiding principle for designing efficient neural network architectures on new domains, like images, sets, graphs, and point clouds. While these symmetries such as translation, rotation, and permutation require only a few bits of prior knowledge to specify, restricting to equivariant models vastly reduces the search space and improves generalization. Using tools from representation theory, we provide a general algorithm for computing the space of equivariant layers for matrix groups. In addition to recovering well known solutions from prior work, we can construct equivariant networks to new groups and representations. Building on these results, we outline a path towards automating model construction more generally.

October 7th: Jamie Haddock (Harvey Mudd)

Title: Topic Models, Methods, and Medicine

Abstract: There is currently an unprecedented demand for efficient, quantitative, and interpretable methods to study large-scale (often multi-modal) data. One key area of interest is that of topic modeling, which seeks to automatically learn latent trends or topics of complex data sets, providing practitioners a view of what is “going on” inside their data. This talk will survey several new tools for topic modeling on matrix and tensor data which allow for use of various forms of supervision and which learn hierarchical structure amongst topics. These tools are of interest across the many fields and industries producing, capturing, and analyzing big data, but are of particular interest in applications where expert supervision is available and often essential (e.g., medicine). We will describe two applications of these methods to medical data; an application to a large-scale patient survey database and an ongoing application to cardiovascular imaging data.