BC Math & Machine Learning Seminar

Spring 2026, Wednesdays 2-3, Maloney 560

1/28/26: Balancedness in deep linear networks, Kathryn Lindsey (Boston College)

Deep linear networks provide a mathematically tractable setting in which to study mechanisms that may extend to more complex nonlinear networks. One such phenomenon is balancedness, the tendency of weight matrices across layers to align their singular subspaces and equalize their singular values during training. I will show that balancedness arises naturally from regularization: for a broad class of regularizers, every critical point of the regularized loss satisfies an exact algebraic balancing condition. The analysis reveals conserved quantities along the gradient flow and leads to a decomposition of the training dynamics into a learning flow, which minimizes the unregularized loss, and a regularizing flow, which enforces balancedness. This talk is based on joint work with G. Menon.

2/11/26: Mathematical exploration and discovery at scale, Javier Gomez Serrano (Brown)

Abstract: Machine learning is transforming mathematical discovery, enabling advances on longstanding open problems. In this talk, I will discuss AlphaEvolve, a general-purpose evolutionary coding agent that uses large language models to autonomously discover old and new mathematical constructions and potentially go beyond them. AlphaEvolve tackles a wide variety of problems across analysis, geometry, combinatorics, and number theory. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. This illustrates how general-purpose AI systems can systematically successfully explore broad mathematical landscapes at an unprecedented speed, leading us to do mathematics at scale.

2/18/26: Symmetry Classification of Shallow ReLU Networks, Pranav Ramakrishnan (Boston College)

Abstract: The symmetries of a neural network, that is to the say the parameters in parameter space that render the same function, has been a common question in the mathematics of machine learning. In this, the case of ReLU has been the most mysterious. In this talk, I will classify all the possible symmetries of shallow ReLU networks (networks with one hidden layer) and talk a little bit about what that entails for the geometry of the space of ReLU functions of a given architecture.

2/25/26: Expressivity, Verification, and Parameterization of ReLU Neural Networks, Moritz Grillo (MPI Leipzig), VIRTUAL

Abstract: Neural networks have achieved striking empirical success, yet their theoretical foundations remain only partially understood. When equipped with the widely used ReLU activation function, such networks compute continuous piecewise linear (CPWL) functions, which makes it possible to study them using tools from polyhedral geometry, combinatorics, and computational complexity. In this talk, I will outline results and open questions concerning three aspects of ReLU networks: expressivity, verification, and parameterization. I will discuss how architectural choices constrain the class of CPWL functions that can be represented, complexity-theoretic challenges in verifying basic properties of the functions they compute, and how different parameter choices may correspond to the same function.

3/11/26: Manifold Geometry of Task Relationships in Multi-Task Learning, Mohamed Al Begaowe (Boston College)

Abstract: Understanding how different cognitive tasks share or segregate neural resources remains a fundamental question in neuroscience and machine learning. Object manifold theory provides a geometric framework for analyzing neural representations by modeling the variability of responses to a single object as a manifold in neural activity space. Manifold capacity theory relates the linear separability of these representations to geometric properties such as manifold size and dimension. This work examines how manifold geometry evolves across layers of deep neural networks trained on visual tasks and investigates how geometric measures vary across tasks. These geometric relationships are then used to study task interactions in both single-task and multi-task learning architectures, providing insights into how task relatedness influences representation sharing and separability.

4/22/26: Model Design and Scaling in Geometric Deep Learning, Elias Nyholm (Chalmers U. of Technology)

Title:

Abstract: Geometric constraints, in the form of equivariance conditions under some data transformation, are powerful implicit biases when designing and training novel machine learning architectures. I will present two different ways of using equivariance to design new machine learning layers from first principles, and in the process try to build a common framework of equivariant layers with both convolutions and self-attention as special cases. Finally, I will outline why you might want to investigate the scaling behavior of these equivariant models through a Chinchilla-style study, and what aspects of equivariant models actually influence these neural scaling laws in practice.

4/29/26: Algebraic invariants in machine learning, Yulia Alexandr (UCLA)

This talk explores how algebraic geometry offers powerful tools for understanding machine learning models. I will focus mostly on ReLU neural networks, where activation patterns give rise to polynomial invariants in the parameters that control the outputs produced at different data points. These invariants constrain the possible output values at test points and provide a geometric perspective on neural network verification. Time permitting, I will also discuss related invariants for lightning self-attention.

Fall 2025, Wednesdays 2-3, Maloney 560

9/10/25: On Universality of Equivariant Neural Networks, Marco Pacini (University of Trento)

Equivariant neural networks provide a principled way to incorporate symmetry into learning architectures and are studied for both their empirical success and mathematical structure. In this talk, we first discuss their separation power—the ability to distinguish inputs up to symmetry—which is a well-understood and necessary condition for approximation. We then examine their approximation capabilities, which remain less well understood. Focusing on equivariant shallow networks, we show that architectures with the same separation power may nevertheless approximate different classes of functions, demonstrating that separation is a necessary but not sufficient condition for universality.

The talk is based on joint works with Xiaowen Dong, Bruno Lepri, Gabriele Santin, and Shubhendu Trivedi.

9/17/25: Deep learning via truncation maps, Patricia Munoz Ewald (University of Texas, Austin)

Abstract: Understanding the internal structure of deep neural networks remains a central challenge in machine learning. In this talk, I introduce the truncation map framework—a geometric tool for analyzing feedforward neural networks layer by layer. This approach reinterprets neural networks as sequences of transformations on input space, enabling both rigorous and intuitive comparisons of data representations across layers.

For networks with ReLU activation, truncation maps reveal a cone-based structure that facilitates explicit constructions of classifiers for structured data. I will present key theoretical results along with preliminary experiments on synthetic data, which demonstrate the emergence of these cone structures in networks trained via stochastic gradient descent.

9/24/25: No seminar

10/1/25: Topological analysis of neuronal assemblies reveals low-rank structure modulated by cholinergic activity, Nikki Sanderson (Brown)

Abstract: Even in the absence of visual stimuli, assemblies of neurons in the visual system of zebrafish larvae collectively activate. Physically, these assemblies are each localized and together cover the optic tectum along its retinotopic axis. Functionally, less is known about these assemblies, such as how the subnetworks of neurons generate and maintain spontaneous dynamics. To better understand the spontaneous dynamics of neural assemblies in zebrafish larvae optic tectum, we recorded calcium imaging of ~2000 neurons for 1 hour at 15 Hz for 12 fish, and then identified glutamatergic, GABAergic, and cholinergic neurons using immunostaining. We applied a topological data analysis (TDA) to the correlations of assembly neurons during and outside of spontaneous activity. Using Betti curves to summarize the persistent homology, we identified a plausible low-rank structure to the correlations agnostic to spectral analyses. Incorporating TDA into a sliding window analysis allowed us to cluster spontaneous activations into distinct dynamic profiles, ultimately leading us to hypothesize about the role of cholinergic neurons in recruiting and maintaining assembly activity, and more broadly, visual attention.

10/8/25: Edge of Stochastic Stability: Chaotic Behaviors of Training Neural Networks with SGD, Pier Benevanto (MIT)

Abstract: Recent findings by Cohen et al. (2021) demonstrate that when training neural networks with full-batch gradient descent with step size eta, the largest eigenvalue lambda_max of the full-batch Hessian consistently stabilizes at lambda_max =2/eta.

These results have significant implications for convergence and generalization. This, however, is not the case of mini-batch stochastic gradient descent (SGD), limiting the broader applicability of its consequences.

We show that SGD trains in a different regime we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at 2/eta is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients.

As a consequence, lambda_max---which is generally smaller than Batch Sharpness---is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We will further discuss implications for mathematical modeling of SGD trajectories.

10/15/25: Theoretical insights for optimizing rational activations (virtual talk), Jiayi Li (Max Planck Institute)

Abstract: Neural networks with rational activations can model curvature and asymptotes compactly, but risk poles and unstable gradients. We introduce positivity-safe rational activations, who denominatiors are guaranteed to remain strictly positive over the network's realized pre-activation domain during training. We implement two smooth parametrizations based on the interval sum-of-squares form and the stable-factor product that enforce a tunable margin for the denominator. We provide explicit certificates for computable bounds on the gradient over the activation domain, yielding an explicit L-smoothness constant for the empirical square loss. We will also discuss geometric properties of the optimization landscape. Through constructive examples, we show that spurious valleys are common for fixed-coefficient rational networks, then we prove that allowing the coefficients to train generically introduces a first-order descent direction. On continual-learning benchmarks, positivity-safe trainable rational reduce forgetting and improve average accuracy, while maintaing strict denominator margins with modest computational overhead.

10/22/25: No seminar (Distinguished Lecture Series)

10/29/25: Progressive Sharpening is Caused by Layerwise Jacobian Alignment, Mark Lowell (National Geospatial-Intelligence Agency)

Abstract: During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that we accurately approximate the true gradient descent dynamics. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the network preactivations near the inputs of the network can cause a large change in the outputs of the network. We further demonstrate that the degree of alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.98.

11/5/25: Parallel synapses and divisive normalization enhance classification capacity, Caitlin Lienkaemper (MIT)

Abstract: Neurons in the brain can be connected by multiple synapses, rather than just one. Using numerical simulations, Song and Benna showed that these parallel connections can increase a neuron’s classification capacity beyond the limit achieved by the perceptron, the classic mathematical model of a linear neuron. This talk describes ongoing work towards deriving this result analytically, with the goal of understanding how parallel synapses improve performance. In our model, a neuron with parallel synapses can apply a monotone nonlinear transformation in each input dimension before performing ``standard" pattern separation, like a perceptron. We have shown that if there exists an entrywise monotone function that maps all patterns to a surface where the total activity across inputs is the same for each pattern, then any labeling of these patterns can be classified. We have also reduced the problem of finding the classification capacity of this nonlinear “dendrite” model to a more familiar perceptron problem, but with correlated inputs. Our geometric characterization already shows that capacity is high when the input patterns are normalized. Since sensory input to the brain is believed to be normalized, this suggests that parallel synapses may boost classification capacity even more than predicted by models that ignore geometric effects.

11/12/25: Activation degree thresholds and expressiveness of polynomial neural networks, Bella Finkel (U Wisconsin, Madison)

Abstract: Polynomial neural networks are implemented in a range of applications and present an advantageous framework for theoretical machine learning. In this talk, we will discuss the expressive power of deep polynomial neural networks through the geometry of their neurovariety. In particular, we introduce the notion of the activation degree threshold of a network architecture to determine when the dimension of a neurovariety achieves its theoretical maximum. We show that activation degree thresholds of polynomial neural networks exist and provide an upper bound, resolving a conjecture of Kileel, Trager and Bruna on the dimension of neurovarieties associated to networks with high activation degree. Certain structured architectures have exceptional activation degree thresholds, making them especially expressive in the sense of their neurovariety dimension. In this direction, we discuss a proof that polynomial neural networks with equi-width architectures are maximally expressive. Along the way, we will see several illustrative examples motivated by applications.

Fall 2023

11/21/23: The question of identifiability for ReLU neural networks, Joachim Bona-Pellissier (University of Toulouse)

The motivations for studying identifiability in the case of neural networks are diverse, ranging from the theoretical to the very practical. One is related to privacy and protection against model inversion attacks, another is related to interpretability, and yet another is related to obtaining theoretical guarantees for the optimization, such as good properties of the objective function and its local minima, or such as the reproducibility of the optimization process. Finally, an important motivation for studying identifiability is to characterize the complexity of the functions implemented by neural networks, a question related to the implicit regularization during the optimization process and to the generalization capabilities of neural networks. Intuitively, the more redundancies there are in the parameters, i.e. the less identifiable a network is, the less rich and complex the space of functions represented by the network is, and the more we can expect good generalization properties.

In this talk, I will present three results from my Ph.D thesis. First, I will present two identifiability results for fully-connected feedforward ReLU neural networks. Then, I will present a result on geometry-induced implicit regularization that derives from this work.

Spring 2023

2/9/23: Symmetries of deep learning models and their internal representations, Charlie Godfrey (Pacific Northwest National Labs)

Abstract: Symmetry has been a fundamental tool in the exploration of a broad range of complex systems. In machine learning, symmetry has been explored in both models and data. We seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family’s internal representation of data. We do this by calculating a set of fundamental symmetry groups, which we call the intertwiner groups of the model. Each of these arises from a particular nonlinear layer of the model, and different nonlinearities result in different symmetry groups. These groups change the weights of a model in such a way that the underlying function that the model represents remains constant but the internal representations of data inside the model may change. We connect intertwiner groups to a model’s internal representations of data through a range of experiments that probe similarities between hidden states across models with the same architecture. Our work suggests that the symmetries of a network are propagated into the symmetries in that network’s representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for ReLU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers, rather than arbitrary linear combinations thereof. Joint work with Davis Brown, Tegan Emerson and Henry Kvinge.

2/23/23: On the Role of Neural Collapse in Transfer and Few-Shot Learning, Tomer Galanti (MIT)

In a variety of machine learning applications, we have access to a limited amount of data from the task that we would like to solve, as labeled data is oftentimes scarce and/or expensive. In such cases, training directly on the available data is unlikely to produce a model that performs well on new, unseen test samples.

A prominent solution to this problem is to apply transfer learning. This approach suggests to pre-train a model on a large-scale source task, such as ImageNet, and fine-tuning it to fit the available data from the downstream task. Recent studies have shown that a single classifier's learned representations over multiple classes can be easily adapted to new classes with very few samples.

In this talk, we provide an explanation for this behavior based on the recently observed phenomenon of neural collapse. We examine the few-shot error of the learned feature map, which is the classification error of the nearest class-center classifier using centers learned from a small number of random samples from each class. We show that the few-shot error generalizes from the training data to unseen test samples and to new classes. This suggests that pre-trained models can provide feature maps that are transferable to new downstream tasks even with limited data available.

3/2/23: A Mathematical Lens on the Inner Workings of Deep Learning Models, Henry Kvinge (Pacific Northwest National Labs)

As both models and data grow, experiments have played an increasingly important role in driving the field of deep learning (DL) forward. We argue that even in this empirical setting however, mathematics has a lot to offer DL in terms of concepts, analytical tools, and frameworks. We provide two examples of this in this talk. In the first, we describe an approach to directly estimating the equivariance of a DL model to a specific group action. We show that targeted evaluations using such an approach can illuminate important aspects of model robustness and learning. Next, we describe how the notion of a frame can help us glimpse the ways that DL models process data and the manifolds from which they are drawn.

Here are the slides from the talk.

3/9/23: No seminar (spring break)

3/16/23: A Solvable Model of Neural Scaling Laws, Dan Roberts (MIT/Salesforce)

Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such scaling laws to arise and then propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. By solving this model in the dual limit of large training set size and large number of parameters, we gain insight into (i) the statistical structure of datasets and tasks that lead to scaling laws, (ii) the way nonlinear feature maps, such as those provided by neural networks, enable scaling laws when trained on these datasets, (iii) the optimality of the equiparameterization scaling of training sets and parameters, and (iv) whether such scaling laws can break down and how they behave when they do. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data's spectral power law causes the model's performance to plateau.

Here are the slides from the talk.

3/23/23: Experimental Math & Machine Learning Lab Kick-off event, Moon Duchin (Tufts)

Professor Duchin will give an informal talk & demo to math department faculty, postdocs, and grad students about how to download and work with some of the census data she uses in her work on geometry and gerrymandering. There will also be a mini-intro to google colab/jupyter notebooks and python

5/4/23: Which loss functions are Morse?, Yaim Cooper (Notre Dame)

Abstract: At the heart of many contemporary machine learning systems is a loss function that is minimized by a gradient based algorithm. One basic property of any function is whether it is Morse or not. In this talk, we ask if and when the loss function of a deep neural network is Morse. We focus on the setting of feedforward neural networks with an added regularizer and discuss some cases that are known, and some that are not known.

Fall 2022

11/8/22: The geometry of linear convolutional networks, Kathlén Kohn (KTH, Stockholm)

We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (i.e., the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network’s architecture on the geometry of the function space. For instance, for LCNs with one-dimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent. This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.

11/17/22, Thursday 4 PM, NOTE UNUSUAL DAY/TIME: Radial neural networks: universal approximation and model compression, Iordan Ganev (Radboud, Institute for Computing and Information Sciences)

Neural network activations conventionally apply the same function to each coordinate. In this talk, we will discuss alternative activations that rescale feature vectors by a function depending only on the norm. The resulting networks are called radial neural networks, and their parameter spaces exhibit rich orthogonal change-of-basis symmetries. Factoring out these symmetries leads to a practical lossless model compression algorithm; we provide a precise relationship between gradient descent optimization of the original and compressed models. Additionally, we explain a universal approximation theorem for such networks.

11/29/22: Designing Domain-specific Representations for Deep Learning in Connectomics, Donglai Wei (Boston College Computer Science)

Abstract:

The field of connectomics aims to reconstruct the brain's wiring diagram from nanometer-resolution 3D microscopy image volumes to enable new insights into the workings of brains. These new insights could inspire novel artificial intelligence algorithms and benefit the treatment development for neurodegenerative diseases. One primary computer vision task in connectomics is neuron segmentation, grouping raw image pixels into individual neurons or neuron compartments. However, existing deep learning models are ineffective on such tasks due to the limited and costly annotation and complex neuron morphology. In this talk, I will present domain-specific input and target representation designs for deep learning models to achieve state-of-the-art segmentation performances, leveraging knowledge about biological structures. First, for 3D volumetric segmentation, we designed the boundary-to-pixel direction representation (CVPR 2020), multi-view representation (MICCAI 21), and skeleton-based distance transformation (under review). Next, for 3D point cloud segmentation, we applied the Frenet-Serrer formula to twisted tubular structures to make the deep learning model invariant to tube morphology (under review).

Bio: Donglai Wei is an assistant professor in the Computer Science Department at Boston College. His research focuses on developing novel registration and reconstruction algorithms for large-scale (currently petabyte-scale) connectomics datasets to empower neuroscience discoveries. During his Ph.D. at MIT under Prof. William Freeman, he worked on video understanding problems, including arrow of time and Vimeo-90K benchmark. Since his postdoc at Harvard University, he has embarked on the quest to reconstruct the brain's wiring diagram in collaboration with Prof. Hanspeter Pfister, Prof. Jeff Lichtman, and Prof. Ed Boyden.

Summer 2022

7/12/22: Demystify Deep Network Architectures: from Theory to Applications, Wuyang Chen (University of Texas, Austin)

Deep neural networks significantly power the success of machine learning. Over the past decade, the community keeps designing architectures of deep layers and complicated connections. However, the gap between deep learning theory and application is growingly large. This talk will center around this challenge and tries to bridge the gap between the two worlds. By theoretically analyzing a network’s Jacobian, NNGP, and NTK, we find an intrinsic trade-off in network architectures. Given a space of architectures, a network cannot be optimal in its expressivity, trainability, and generalization at the same time, and it has to keep a balance between its depth and width. In other words, separately optimizing expressivity, trainability, and generalization will give us different network architectures. This analysis has further practical implications. Automated machine learning (AutoML) is a powerful tool to address design problems, yet, at the price of heavy computation costs during model training. Our theory serves as an accurate and efficient guidance for the architecture design. We propose to significantly accelerate AutoML with our theory-grounded, training-free metrics. Without any training cost, our TE-NAS framework can automatically design novel and accurate network architectures on ImageNet in only four GPU hours.

Bio: Wuyang Chen is a Ph.D. candidate in Electrical and Computer Engineering at University of Texas at Austin. Wuyang’s research focuses on theoretical understandings of deep network architectures and AutoML applications. Wuyang also worked on domain adaptation/generalization and self-supervised learning. His work is published on ICLR, ICML, CVPR, ICCV, etc. Wuyang completed his research internship in NVIDIA and Google Brain. Wuyang chaired the 4rd and the 5th version of UG2+ workshop and challenge in CVPR 2021 and 2022. Wuyang is also a board member of the One World Seminar Series on the Mathematics of Machine Learning.

Website: https://chenwydj.github.io/

Spring 2022

5/3/22: Exact Combinatorial and Topological Data for ReLU Networks' Linear Regions, Marissa Masden (University of Oregon)

Abstract: One goal at the interface of topological data analysis and machine learning is to “tune” the topology of a network to that of the data. To that end, we report substantial progress on understanding the topology of ReLU networks. We study the canonical polyhedral complex, recently defined by E. Grigsby and K. Lindsey, which encodes a ReLU network's decomposition of input space and thus its decision boundary. We find that while the polyhedral complex is composed of arbitrary combinatorial types, generically its geometric dual is a cubical complex. We call this the sign sequence cubical complex, and establish additional algebraic structure on it, extending similar structure from the theory of hyperplane arrangements. We use this to show that the locations and sign sequences of the vertices of a network's polyhedral complex, which can be computed recursively through network layers, fully determine the complex combinatorially and topologically. Computing the polyhedral complex by taking advantage of this structure is robust to floating point errors which can arise through standard approaches to polyhedral intersection, giving an effective algorithm to fully encode decision boundaries. Running empirics preliminarily indicates that the distribution of topological properties of shallow networks' decision boundaries at initialization is roughly constant as width varies, but those topological properties vary with width for deeper networks.

2/8/22: The Computability of PAC Learning, Julian Asilis (Boston College, Computer Science)

A recent research direction seeks to align learning theory with its computational intentions by considering PAC learning under the restriction that learners be computable (rather than merely measurable). I discuss several works in this area, beginning with a brief treatment of relevant concepts in computability theory. First, I discuss recent advances made toward characterizing the computability of PAC learning over N, including the demonstration of a class of finite VC dimension without any proper computable PAC learners. Subsequently, I present joint work on the computability of learning over more general metric spaces, including the demonstration of a computable learner whose sample functions are all noncomputable. Finally, I consider open questions in the area.

1/25/22: A Neural Network Ensemble Approach to System Identification, Elisa Negrini (Worcester Polytechnic Institute)

We present a new algorithm for learning unknown governing equations from trajectory data, using an ensemble of neural networks. Given samples of solutions x(t) to an unknown dynamical system dx/dt=f(t,x(t)), we approximate the function f using an ensemble of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iteration a different neural network as a prior for f. This procedure yields M-1 time-independent networks, where M is the number of time steps at which x(t) is observed. Finally, we obtain a single function f(t,x(t)) by neural network interpolation. Unlike our earlier work, where we numerically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate f, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples both with and without noise in the data. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in presence of noise, when numerical differentiation provides low quality target data. Finally, we compare our proposed method with other algorithms for system identification.

Here are the slides.

Fall 2021

12/7/21: Understanding and Accelerating Neural Architecture Search with Theory-Grounded Metrics, Atlas Wang (University of Texas, Austin)

In this talk, I will discuss on how to design a unified, training-free, and DL theory-grounded framework for Neural Architecture Search (NAS), with high performance, very low cost, and interpretation. NAS has been explosively studied to automate the discovery of top-performer neural networks but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a network's performance without training. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling essential theory-inspired characteristics of searched networks – Trainability, Expressivity, and Generalization (we call “TEG”), all assessed in a training-free manner. Our indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our framework. Moreover, we visualize search trajectories on the landscapes of those characteristics, which lead to the first interpretable analysis of various NAS algorithms’ behaviors on different benchmarks.

Reference:

[1] https://openreview.net/forum?id=Cnon5ezMHtu

[2] https://arxiv.org/abs/2108.11939

11/16/21: Neural nets in natural language processing, Emily Prud'hommeaux (Boston College, Computer Science)

After many false starts over the past forty years, neural networks have become the dominant approach to machine learning for natural language processing (NLP). In this talk, I will describe and demonstrate a few simple but interesting ways neural nets are used in NLP, from representing lexical semantics to recognizing speech. Interested listeners can follow along with the demos using Colab or Python installed on their own machine.

10/19/21: Convex codes, neural networks, and oriented matroids, Alex Kunin (Baylor College of Medicine/University of Houston, Neuroscience/Mathematics)

Starting with a story about the hippocampus, I'll motivate an algebraic-topological view of neural activity, which mostly boils down to the coincidentally-named nerve theorem. This leads to a generalized sort of "inverse nerve" problem: given some collection of sets (not necessarily a simplicial complex), does it encode the intersections of some convex sets in R^d? I'll share my progress on answering this question and its connections to deep learning, both of which stem from hyperplane arrangements.

9/28/21: Beyond the Bias-Complexity Tradeoff, Jean-Baptiste Tristan (Boston College, Computer Science)

I will present some of the most important results in statistical learning theory (SLT) to give some context to the ongoing efforts to explain deep learning with neural networks. First, I will present the fundamental theorem of statistical learning theory that characterizes the generalizability of learning algorithms using VC theory. Second, I will explain how recalcitrant learning algorithms were analyzed in the framework of structural risk minimization, using concepts such as stability or duality. Finally, I will explain why existing results have failed to provide a convincing explanation to deep learning and review some of the promising approaches in light of SLT's past successes.

This talk will contain no original research.

Summer 2021

8/10/21: Intro to Graph neural networks, Cihan Soylu

8/3/21: Neural differential equations, Kathryn Lindsey

7/13/21: A fractional approach to regularization, Adebo Sijuwade (Washington State University)

Fractional calculus is an effective tool that has recently been used to improve the performance of gradient descent methods, some of the most common methods used to optimize neural networks. Caputo-based gradient methods have been effective over their integer-order equivalents due to their long memory characteristics but limited in that convergence to a local optimum is not guaranteed. To avoid overfitting, it is of interest to consider the role of gradient methods in regularization problems. In this talk, I will discuss a recently proposed gradient method based on a fractional derivative operator with smooth kernel and address its compatibility with L1 regularization.

Slides are here.

7/6/21: The critical locus of overparameterized neural networks (Y. Cooper) , Elisenda Grigsby

I will tell you about Y. Cooper's investigation of the geometry of the loss landscape for overparameterized neural networks with smooth, monotone activation function, following this paper.

Here are the notes from the talk.

6/15/21: Kernel methods in ML and the neural tangent kernel, Elisenda Grigsby

I will talk about kernel methods in machine learning and the neural tangent kernel, following this paper and this blog post.

Here are the notes from the talk.

Spring 2021

4/6/21: Recent advances in the analysis of the implicit bias of gradient descent on deep networks, Matus Telgarsky (UIUC)

The purpose of this talk is to highlight three recent directions in the study of implicit bias --- one of the current promising approaches to trying to develop a tight generalization theory for deep networks, one interwoven with optimization. The first direction is a warm-up with purely linear predictors: here, the implicit bias perspective gives the fastest known hard-margin SVM solver! The second direction is on the early training phase with shallow networks: here, implicit bias leads to good training and testing error, with not just narrow networks but also arbitrarily large ones. The talk concludes with deep networks, providing a variety of structural lemmas which capture foundational aspects of how weights evolve for any width and sufficiently large amounts of training.

Fall 2020

12/1/20: On the topological expressiveness of neural networks, Elisenda Grigsby

I will describe a joint on-going project with K. Lindsey aimed at developing a general framework for understanding how the architecture of a neural network constrains the topological features of its decision regions.

11/24/20: No seminar (Thanksgiving)

11/17/20: Translation and Attention, Dalton Fung

I will start by introducing the language translation task, which is one of the many important tasks in Natural Language Processing (NLP). I'll then go over what the attention mechanism is, how it is invented to solve some flaws of the traditional models, and finally explain why it eventually becomes one of the most important architectures today in NLP tasks.

11/10/20: What are Random Forests? (Answer: neural networks), Adam Saltz

I'll give an overview of how random forests work, then describe why every random forest is a neural network.

11/03/20: Neural Network Initialization Processes, Marissa Masden

Abstract: Before neural networks are trained, their weights need to be assigned initial values. The initial distribution of weights affects properties of the network during training. I will discuss some common methods of random initialization for fully-connected feedforward networks, and some interesting corresponding properties of networks at initialization. I will then introduce a novel geometrically-inspired algorithm for initializing fully-connected networks called Linear Discriminant Sorting, developed together with my advisor, Dev Sinha. The success of this technique brings some intuition toward the geometric properties of neural networks.

10/27/20: On connected sublevel sets in deep learning (Q. Nguyen), Elisenda Grigsby

I'll be talking about this paper, which proves that every sublevel set of the loss function for certain feedforward neural network architectures (assuming activation functions that are homeomorphisms R-->R) is connected. I'll define all the terms I mentioned in the previous sentence, and will spend much of my time putting the result in context.

10/20/20: Explainability Methods for Neural Networks, Cihan Soylu

Abstract: Neural networks became an important machine learning tool for achieving human-level performance for many learning tasks. However, due to the black-box nature of these models, it is difficult to understand which features of a given input is causing the decision of the learned network. This understanding is crucial for tasks such as medical diagnostics. In this talk, we will go over various explainability methods proposed for neural networks and ways to evaluate these methods.

Page updated

Google Sites

Report abuse