Blog

Year > 202420232022202120202019

On continuous terminal embeddings of sets of positive reach

August 14, 2024

In modern data science, one of the pillars of dimensionality reduction is the celebrated Johnson-Lindenstrauss (J-L) Lemma. Given a finite dataset X, the J-L Lemma provides simple random linear embeddings of X onto a linear space of dimension proportional to log(|X|) able to approximately preserve distances between its elements with high probability. Recently, Elkin, Filtser and Neiman remarkably improved this idea by proposing so-called terminal embeddings. These are random, nonlinear embeddings that can approximately preserve distances between elements of the dataset X and any arbitrary point of the ambient vector space that X lives in, with high probability. The benefits of this new paradigm in data science have been recently investigated in the context of compressive classification.

In our latest preprint

https://arxiv.org/abs/2408.02812

Rafael Chiclana, Tim Hoheisel, Mark Iwen and I produce terminal embeddings with desirable regularity properties, namely Hölder continuity. Our first result considers finite sets and provides optimal dimensionality reduction through terminal embeddings that are locally 1/2-Hölder continuous almost everywhere. Our second main result holds for any (possibly, infinite) set X with positive reach τ > 0 and provides optimal dimensionality reduction through terminal embeddings that are 1/4-Hölder continuous for every point within the reach of X.

A nonconvex set X and its reach τ. Sets with positive reach are central to our analysis.

A new convergence theorem for physics-informed neural networks

June 10, 2024

A new paper in collaboration with Nick Dexter, Samir Karam and Weiqi Wang is out!

https://arxiv.org/abs/2406.01539 

We propose a new convergence analysis for Physics-Informed Neural Networks (PINNs), i.e., deep neural networks trained to solve Partial Differential Equations (PDEs). PINNs recently gained an impressive amount of attention in the scientific machine learning community. Yet, most of the research activity on the topic focuses on practical/numerical aspects and a comprehensive convergence theory is still lacking.

Our analysis focuses on the class of high-dimensional, periodic diffusion-reaction equations, motivated by applications in computational finance and quantum chemistry. We show that PINNs can mitigate the curse of dimensionality by proving that it is sufficient to use a training set whose size scales logarithmically or, at worst, linearly with the domain’s dimension.

Our study is based on practical existence theory, a framework recently introduced in the field of function approximation (see this recent review paper, in press for the Handbook of Numerical Analysis). In our work, we considerably extend its scope by applying it to the high-dimensional PDE setting. Moreover, we leverage (and generalize) recent results in compressive Fourier collocation methods for PDEs (see this paper, recently published in the IMA Journal of Numerical Analysis).

A PINN with periodic layer. In order to solve high-dimensional diffusion-reaction equations with periodic boundary conditions, we consider PINNs with periodic layers. (Figure by Weiqi Wang.)
Exact replication of Fourier functions. A key step of our proof is the explicit construction (summarized above) of periodic PINNs with RePU activation able to exactly replicate Fourier basis functions. This construction is then combined with recovery guarantees for compressive Fourier collocation to obtain a convergence result in the form of practical existence theorem. (Figure by Weiqi Wang.)

A weekend in Saskatoon: Mathematics of Machine Learning and tenure 

June 3, 2024

This weekend was a very eventful one! I had the pleasure to co-organize a Scientific Session entitled "Mathematics of Machine Learning" at the 2024 Summer Meeting of the Canadian Mathematical Society with Vakhtang Putkaradze (University of Alberta) and Hamid Usefi (Memorial University of Newfoundland), which took place in Saskatoon. This session has been part of every CMS Meeting since Winter 2021 and it has now officially become a lovely tradition! Our session was characterized by presentations on a wide variety of cutting-edge topics, including spiking neural networks, transformers, physics-informed neural networks, active learning methods, algorithms for large-scale graph-structured data, and cyber security. 

I am also proud of my students' contributions to the conference. Samir Karam (PhD student, co-advised with Galia Dafni) gave a talk on his recent results on the convergence analysis of physics-informed neural networks for high-dimensional partial differential equations. Rahul Padmanabhan presented a poster on his Master's thesis project on the approximation of matrix functions with transformers. 

Incidentally, I also officially became a tenured Associate Professor at Concordia on June 1st. I was lucky to celebrate this milestone with some of my students and colleagues in Saskatoon!

Scientific Session "Mathematics of Machine Learning", 2024 CMS Summer Meeting. Here are some of the speakers and participants of the session. Aren't we beautiful?!
Open problems. This blackboard is the result of an open problem discussion that we had with the speakers and participants at the end of our session. It contains research directions for decades to come!

Real-time motion detection using dynamic mode decomposition

May 9, 2024

Dynamic Mode Decomposition (DMD) is a numerical method that seeks to fit time-series data to a linear dynamical system. In doing so, DMD decomposes dynamic data into spatially coherent modes that evolve in time according to exponential growth/decay or with a fixed frequency of oscillation. A prolific application of DMD has been to video, where one interprets the high-dimensional pixel space evolving through time as the video plays. 

In a newly submitted paper, Marco Mignacca, Jason Bramburger and I propose a simple and interpretable motion detection algorithm for streaming video data rooted in DMD. Our method leverages the fact that there exists a correspondence between the evolution of important video features, such as foreground motion, and the eigenvalues of the matrix which results from applying DMD to segments of video. We apply the method to a database of test videos which emulate security footage under varying realistic conditions. Effectiveness is analyzed using receiver operating characteristic curves, while we use cross-validation to optimize the threshold parameter that identifies movement.

Our paper can be downloaded at https://arxiv.org/abs/2405.05057 !

Notably, this paper stemmed from Marco Mignacca's undergraduate summer research project, funded by the NSERC USRA program. Marco was also awarded a CMS Poster Award for this research in Fall 2023. 

Illustration of our motion-detection method based on DMD. Sudden spikes in the average of the DMD spectrum represent motion detection. Each spike is accompanied by one or more individuals entering or exiting the frame, as demonstrated by the screenshots. (Figure by Marco Mignacca). See Figure 6 in the paper.

New review paper:  Learning smooth functions in high dimensions

April 9, 2024

Significant advances have been made in the last decade towards efficient approaches for high-dimensional approximation, from sparse polynomial methods to more recent techniques based on Deep Neural Networks (DNNs). A new review paper on the topic, in collaboration with Ben Adcock, Nick Dexter and Sebastian Moraga and submitted as a book chapter for a special volume of the Handbook of Numerical Analysis, is available on the arXiv! 

https://arxiv.org/abs/2404.03761 

In it, we describe the contemporary motivations for smooth high-dimensional approximation problems, relevant function classes, fundamental limits of learnability, and finally, sparse polynomial and DNN methods. For the latter, there is currently a significant gap between approximation theory and practical performance. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.

For other recent publications directly related to this work, see [ABDM22a, ABDM22b, ABW22]

Best s-term rates: exponential or algebraic? Best s-term polynomial approximation error of a smooth function of d = 4, 8, 16, 32 variables as a function of the sparsity s. This error is compared with theoretical exponential and algebraic decay rates, whose sharpness depends on the size of d and s. For more details, see Figure 1 in the paper. (Numerics by Ben Adcock.)

Surrogate models for diffusion on graphs : Congrats, Kylian!

March 20, 2024

Kylian Ajavon, a Master's student in my research group at Concordia has just defended his thesis entitled "Surrogate Models for Diffusion on Graphs: A High-Dimensional Polynomial Approach"

In his thesis, Kylian considers parametric diffusion problems on undirected graphs with community structure such as those generated using the stochastic block model (see figure). The thesis shows (both theoretically and numerically) that sparse polynomial approximation techniques such as least squares and compressed sensing can accurately approximate parameter-to-solution maps for diffusion on graphs. Up to this point, the literature on parametric models mostly focused on parametric PDEs. Kylian's work has the potential to pave the way for the development of a new class of methods for surrogate modelling on graphs.

Kylian's thesis will be publicly available soon on Concordia's Spectrum Research Repository

The figure illustrates a random graph with community structure generated using the stochastic block model. Kylian's Master's thesis considers parametric diffusion problems on graphs of this type. (Figure by Kylian Ajavon.)

New video on practical existence theorems (BIRS worksop) 

February 19, 2024

I've just given a talk at the BIRS workshop Structured Machine Learning and Time–Stepping for Dynamical Systems summarizing some of my recent work on the mathematics of deep learning. 

In my talk, I illustrated the framework of practical existence theorems, a new class of theoretical guarantees for deep learning that improve universal approximation theorems. Beyond establishing the existence of neural networks able to approximate functions from a certain class, these theorems also provide conditions on the network architecture, the size of the training set, and the training strategy sufficient to achieve near-optimal approximation rates. I illustrated practical existence theorems in three scientific computing contexts:

This presentation showcases work in collaboration with B. Adcock, N. Dexter, S. Karam, N.R. Franco, S. Moraga, W. Wang, and C. Webster.

The video is available at this link: https://www.birs.ca/events/2024/5-day-workshops/24w5301/videos/watch/202402191611-Brugiapaglia.html 


Video recording of the talk "Practical existence theorems for deep learning approximation in high dimensions" (top) and group photo with workshop's participants (bottom).

Neural rank collapse

February 6, 2024

A new preprint in collaboration with Emanuele Zangrando, Piero Deidda, Nicola Guglielmi and Francesco Tudisco, a talented and dynamic group from the Gran Sasso Science Institute, is out!

https://arxiv.org/abs/2402.03991 

Recent work in deep learning has shown strong empirical evidence of an implicit low-rank bias: weight matrices in trained deep networks tend to be approximately low-rank. However, the majority of the theoretical investigations about this phenomenon deal with oversimplified deep linear networks. In our paper, we consider general networks with nonlinear activations trained with weight decay and show the presence of an intriguing neural rank collapse phenomenon, connecting low-rank bias with neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by numerical illustrations.

This work is the result of two productive academic visits in Spring and Fall 2023, generously supported by the Gran Sasso Science Institute.

Numerical illustration of the neural rank collapse phenomenon. Relative distance of weight matrices from the rank-10 manifold for each layer of a fully connected network as a function of the weight decay parameter λ and the standard deviation σ of the training dataset (composed of Gaussian mixed data). For smaller σ and larger λ, weight matrices have lower rank. (Numerical experiment by Emanuele Zangrando.)

A practical existence theorem for reduced order models based on convolutional autoencoders

February 2, 2024

Nicola Rares Franco (Politecnico di Milano) and I have just submitted a new paper!

https://arxiv.org/abs/2402.00435 

In recent years, deep learning has gained increasing popularity in the fields of PDEs and reduced order modeling, providing domain practitioners with new powerful data-driven techniques. In this context, deep autoencoders based on Convolutional Neural Networks (CNNs) have proven extremely effective, outperforming established techniques, such as the reduced basis method. 

However, despite the empirical success of CNN-based autoencoders, there are only a few theoretical results supporting these architectures, usually stated in the form of universal approximation theorems. Many practical questions remain unanswered, such as the number of snapshots needed for convergence or the neural network training strategy. In this work, we fill some of these gaps by providing a new practical existence theorem (building upon a recent paper by B. Adcock, N.Dexter, S. Moraga and me) for CNN-based autoencoders when the parameter-to-solution map is holomorphic (such as, e.g., parametric diffusion equations). 

A 2D convolutional layer acting on a given input. The action of the convolutional layer can be visualized either in terms of a moving filter (A) or using the equivalent matrix representation (B): in both cases, despite mapping from a 9-dimensional to a 4-dimensional space, the layer only comes with 4 learnable parameters, instead of 9 · 4 = 36. (Figure by Nicola R. Franco.)

Year > 202420232022202120202019