The Square Root LASSO and the "tuning trade off"April 4, 2023
Aaron Berk, Tim Hoheisel and I have recently submitted a new paper on the Square Root LASSO, available on the arXiv at
Building on our recent work LASSO Reloaded, we propose a variational analysis of the Square Root LASSO, showing regularity results for the solution map and studying tuning parameter sensitivity. More specifically, we identify assumptions that guarantee well-posedness and Lipschitz stability, with explicit upper bounds. Our investigation reveals the presence of a "tuning trade off" and suggests that the robustness of the Square Root LASSO's optimal tuning to measurement noise comes at the price of increased sensitivity of the solution map to the tuning parameter itself.
The greedy side of the LASSOMarch 8, 2023
Sina M.-Taheri and I have just submitted a new paper, available on the arXiv at
In it, we propose a class of greedy algorithms for weighted sparse recovery by considering new loss function-based generalizations of Orthogonal Matching Pursuit (OMP). We show that greedy selection rules associated with popular loss functions such as those of the LASSO (Least Absolute Shrnkage and Selection Operator), Square Root LASSO (SR-LASSO) and Least Absolute Deviations LASSO (LAD-LASSO) admit explicitly computable and simple formulas. Moreover, we numerically demonstrate the effectiveness of the proposed algorithms and empirically show that they inherit desirable characteristics from the corresponding loss functions, such as SR-LASSO's noise-blind optimal parameter tuning and LAD-LASSO's fault tolerance. In doing so, our study sheds new light on the connection between greedy sparse recovery and convex relaxation.
Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networksNovember 30, 2022
A new paper in collaboration with B. Adcock, N. Dexter, and S. Moraga is on the arXiv!
The past decade has seen increasing interest in applying Deep Learning (DL) to computational science and engineering. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and sample complexity.
Motivated by DL-based approaches for parametric PDEs, our paper provides arguments for Deep Neural Network (DNN) approximation of infinite-dimensional smooth functions taking values in an infinite-dimensional Banach space (such as typical PDE solution spaces) that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture size and training procedures that achieve near-optimal algebraic rates of convergence. These results involve key extensions of compressed sensing for Banach-valued recovery and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data.
One world seminar on the mathematics of machine learningNovember 18, 2022
On November 16, I had the pleasure to give a talk at the One World Seminar Series on the Mathematics of Machine Learning, entitled The mathematical foundations of deep learning: from rating impossibility to practical existence theorems. Its recording has just been published on YouTube. Check it out!
Inspired by Smale's 18th problem, I presented my recent work on the mathematical foundations of deep learning by illustrating two case studies.
First, motivated by applications in cognitive science, I presented rating impossibility theorems. These theorems identify frameworks where neural networks are provably unable to generalize outside the training set when performing the seemingly simple task of learning identity effects, i.e. classifying pairs of objects as identical or not. (See this paper).
Second, motivated by applications in scientific computing, I illustrated practical existence theorems. These theorems combine universal approximation results for deep neural networks with compressed sensing and high-dimensional polynomial approximation theory. As a result, they yield sufficient conditions on the network architecture, the training strategy, and the number of samples able to guarantee accurate approximation of smooth functions of many variables. (See this paper and the sparse-hd-book).
I would like to thank Boumediene Hamzi for the kind invitation to speak.
The mathematics of machine learning at the CMS Winter Meeting 2022November 10, 2022
I am proud to announce an exciting session on the mathematics of machine learning I co-organized with Ben Adcock (SFU), Giang Tran (Waterloo) and Hamid Usefi (Memorial). This event is part of the 2022 Winter Meeting of the Canadian Mathematical Society (CMS) that will take place in Toronto on December 3-5 2022.
Our session will mark the third in a series of sessions at CMS meetings on this theme. Its aim is to bring together a diverse group of leading experts in mathematics of machine learning and it will feature presentations from 18 researchers from Canada and the US. It will be a forum for discussing and exploring emerging ideas in this fast-growing and exciting field. Check it out!
Is Monte Carlo a bad sampling strategy?September 1st, 2022
A new preprint in collaboration with Ben Adcock (Simon Fraser University) is out!
Polynomial approximation of smooth functions of many variables from pointwise samples (subject of our recently published book) is a widely-used tool in scientific computing, not least in parametric modelling and uncertainty quantification. In this context, Monte Carlo sampling is an established strategy able to alleviate the so-called curse of dimensionality. Yet, it is also know to be theoretically suboptimal. This gap has led in recent years to a focused research effort on the development of improved sampling strategies, some of which are provably near-optimal from the approximation-theoretic perspective.
Many of these approaches offer significant improvements in low dimensions. However, in higher dimensions, it has been well-documented that the performance gap between theoretically-suboptimal Monte Carlo sampling and improved and/or near-optimal sampling strategies dramatically lessens. The purpose of our work is to resolve this seeming paradox. In this paper we demonstrate (both numerically and theoretically) that, in reality, Monte Carlo is an eminently suitable sampling strategy in sufficiently high dimensions.
New preprint: a coherence parameter characterizing generative compressed sensing with Fourier measurementsJuly 28, 2022
The framework of generative compressed sensing proposed by Bora et al. in 2017 had a significant impact and has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. In our new preprint
Aaron Berk (McGill), Babhru Joshi, Yaniv Plan, Matthew Scott, Özgür Yilmaz (all based at UBC), and I move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (e.g. subsampled Fourier matrices). The "keystone" of our theoretical analysis is the coherence, a new parameter, which measures the interplay between the range of the generative network and the measurement matrix. In addition, we propose a regularization strategy for training generative neural networks to have favourable coherence with the measurement operator and provide compelling numerical simulations that support this regularized training strategy.
Compressive Fourier collocation for high-dimensional PDEsJune 7, 2022
Another exciting preprint prepared in collaboration with Weiqi Wang (Concordia) is on the arXiv!
We propose a new method for the numerical solution of high-dimensional PDEs with periodic boundary conditions called compressive Fourier collocation. Combining techniques from sparse high-dimensional approximation and Monte Carlo sampling, our method requires a number of collocation points that scales mildly (i.e., logarithmically) with respect to the dimension of the PDE domain. Focusing on the case of high-dimensional diffusion equations, we provide recovery guarantees based on the recently proposed framework of sparse recovery in subsampled bounded Riesz systems and carry out numerical experiments up to dimension 20.
LASSO reloadedMay 17, 2022
I am quite excited to announce a new arXiv preprint written in collaboration with Aaron Berk (McGill) and Tim Hoheisel (McGill):
We provide a variational analysis of the unconstrained formulation of the LASSO problem, ubiquitous in statistical learning, signal processing, and inverse problems. In particular, we establish smoothness results for the optimal value as well as Lipschitz properties of the optimal solution as functions of the measurement vector and the tuning parameter. Moreover, we show how to apply the proposed variational analysis in the context of compressed sensing with subgaussian measurements. Our theoretical findings are validated by numerical experiments.
Interview for Concordia News: How to make AI ‘more intelligent’April 5, 2022
I had the pleasure to be interviewed by Elizabeth Faure for Concordia News. In the interview, I talk about the importance of developing new theoretical guarantees for deep learning in order to deploy safer and more reliable AI technologies. In particular, I present my recent work in collaboration with Ben Adcock, Nick Dexter, and Sebastian Moraga on practical existence theorems for deep neural networks to appear in the Proceedings of Machine Learning Research.
You ca read the interview at this link:
I would also like to thank Taylor Tower for her assistance through the process.
On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samplesMarch 31, 2022
A new preprint in collaboration with Ben Adcock, Nick Dexter, and Sebastian Moraga is out!
Our paper addresses the following question: are there robust, efficient algorithms for computing approximations to finite- or infinite-dimensional, holomorphic and Hilbert-valued functions from limited samples that achieve best s-term rates?
We answer this in the affirmative by introducing algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic, and physical discretization errors. Our work involves several significant developments of existing techniques, including a novel restarted primal-dual iteration for solving weighted l1-minimization problems in Hilbert spaces. Our theory is supplemented by numerical experiments demonstrating the practical efficacy of these algorithms.
Check out our preprint on the arXiv! https://arxiv.org/abs/2203.13908
Book officially publishedMarch 5, 2022
I am proud to announce that the book "Sparse Polynomial Approximation of High-dimensional Functions" by Ben Adcock, Clayton Webster, and me has now been officially published by SIAM. You can
Enjoy! And if you have any thoughts on the book you'd like to share with me, do not hesitate to write me an email.