Blog

Year > 2022-23202120202019

The Square Root LASSO and the "tuning trade off"

April 4, 2023

Aaron Berk, Tim Hoheisel and I have recently submitted a new paper on the Square Root LASSO, available on the arXiv at 

https://arxiv.org/abs/2303.15588

Building on our recent work LASSO Reloaded, we propose a variational analysis of the Square Root LASSO, showing regularity results for the solution map and studying tuning parameter sensitivity. More specifically, we identify assumptions that guarantee well-posedness and Lipschitz stability, with explicit upper bounds. Our investigation reveals the presence of a "tuning trade off" and suggests that the robustness of the Square Root LASSO's optimal tuning to measurement noise comes at the price of increased sensitivity of the solution map to the tuning parameter itself.

The "tuning trade off": LASSO vs. Square Root LASSO. The figure illustrates the local Lipschitz behaviour of the (unconstrained) LASSO (in blue) and of the Square Root LASSO (in red). Dashed lines represent the solution map's variation in Euclidean norm as a function of the tuning parameter. Our Lipschitz bounds are plotted using solid lines. For more details, see Figure 2 in the paper. Numerics by Aaron Berk.

The greedy side of the LASSO

March 8, 2023

Sina M.-Taheri and I have just submitted a new paper, available on the arXiv at 

https://arxiv.org/abs/2303.00844 

In it, we propose a class of greedy algorithms for weighted sparse recovery by considering new loss function-based generalizations of Orthogonal Matching Pursuit (OMP). We show that greedy selection rules associated with popular loss functions such as those of the LASSO (Least Absolute Shrnkage and Selection Operator), Square Root LASSO (SR-LASSO) and Least Absolute Deviations LASSO (LAD-LASSO) admit explicitly computable and simple formulas. Moreover, we numerically demonstrate the effectiveness of the proposed algorithms and empirically show that they inherit desirable characteristics from the corresponding loss functions, such as SR-LASSO's noise-blind optimal parameter tuning and LAD-LASSO's fault tolerance. In doing so, our study sheds new light on the connection between greedy sparse recovery and convex relaxation.

Robustness of parameter tuning to unknown noise for SR-LASSO-based OMP. The solution accuracy obtained via SR-LASSO based OMP is plotted as a function of the tuning parameter λ for different levels of measurement noise. Like in the SR-LASSO case, the optimal choice of λ is independent of the noise level. For more details, see Figure 1 of the paper. Numerics by Sina M.-Taheri.

Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks

November 30, 2022

A new paper in collaboration with B. Adcock, N. Dexter, and S. Moraga is on the arXiv!

https://arxiv.org/abs/2211.12633 

The past decade has seen increasing interest in applying Deep Learning (DL) to computational science and engineering. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and sample complexity. 

Motivated by DL-based approaches for parametric PDEs, our paper provides arguments for Deep Neural Network (DNN) approximation of  infinite-dimensional smooth functions taking values in an infinite-dimensional Banach space (such as typical PDE solution spaces) that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture size and training procedures that achieve near-optimal algebraic rates of convergence. These results involve key extensions of compressed sensing for Banach-valued recovery and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data. 

Error bound proved in our paper. The learning error associated with the approximation of a Banach-valued function f : U -> V from m pointwise samples is measured with respect to a suitable Lebsgue-Bochner norm. Our bound includes approximation, discretization, sampling, and optimization errors. The parameter θ is equal to 0 when V is a Hilbert space and 1/4 for general Banach spaces.

One world seminar on the mathematics of machine learning

November 18, 2022

On November 16, I had the pleasure to give a talk at the One World Seminar Series on the Mathematics of Machine Learning, entitled The mathematical foundations of deep learning: from rating impossibility to practical existence theorems. Its recording has just been published on YouTube. Check it out!

Inspired by Smale's 18th problem, I presented my recent work on the mathematical foundations of deep learning by illustrating two case studies. 

First, motivated by applications in cognitive science, I presented rating impossibility theorems. These theorems identify frameworks where neural networks are provably unable to generalize outside the training set when performing the seemingly simple task of learning identity effects, i.e. classifying pairs of objects as identical or not. (See this paper).

Second, motivated by applications in scientific computing, I illustrated practical existence theorems. These theorems combine universal approximation results for deep neural networks with compressed sensing and high-dimensional polynomial approximation theory. As a result, they yield sufficient conditions on the network architecture, the training strategy, and the number of samples able to guarantee accurate approximation of smooth functions of many variables. (See this paper and the sparse-hd-book).

I would like to thank Boumediene Hamzi for the kind invitation to speak.

The mathematics of machine learning at the CMS Winter Meeting 2022

November 10, 2022

I am proud to announce an exciting session on the mathematics of machine learning I co-organized with Ben Adcock (SFU), Giang Tran (Waterloo) and Hamid Usefi (Memorial). This event is part of the 2022 Winter Meeting of the Canadian Mathematical Society (CMS) that will take place in Toronto on December 3-5 2022.

Our session will mark the third in a series of sessions at CMS meetings on this theme. Its aim is to bring together a diverse group of leading experts in mathematics of machine learning and it will feature presentations from 18 researchers from Canada and the US. It will be a forum for discussing and exploring emerging ideas in this fast-growing and exciting field. Check it out!

https://www2.cms.math.ca/Events/winter22/sessions_scientific#mom 

CMS Winter Meeting 2022, Toronto, Dec 3-5 2022. See http://winter22.cms.math.ca 

Is Monte Carlo a bad sampling strategy?

September 1st, 2022

A new preprint in collaboration with Ben Adcock (Simon Fraser University) is out!

https://arxiv.org/abs/2208.09045 

Polynomial approximation of smooth functions of many variables from pointwise samples (subject of our recently published book) is a widely-used tool in scientific computing, not least in parametric modelling and uncertainty quantification. In this context, Monte Carlo sampling is an established strategy able to alleviate the so-called curse of dimensionality. Yet, it is also know to be theoretically suboptimal. This gap has led in recent years to a focused research effort on the development of improved sampling strategies, some of which are provably near-optimal from the approximation-theoretic perspective. 

Many of these approaches offer significant improvements in low dimensions. However, in higher dimensions, it has been well-documented that the performance gap between theoretically-suboptimal Monte Carlo sampling and improved and/or near-optimal sampling strategies dramatically lessens. The purpose of our work is to resolve this seeming paradox. In this paper we demonstrate (both numerically and theoretically) that, in reality, Monte Carlo is an eminently suitable sampling strategy in sufficiently high dimensions. 

In low dimensions, Monte Carlo sampling is a poor sampling strategy. However, it becomes far better as the dimension increases. This figure plots the approximation error  versus the number of samples for (adaptive) least-squares polynomial approximation using Monte Carlo samples (ALS-MC) and a near-optimal sampling strategy (ALS-Opt) in dimension 2 (left) and 4 (right). The function considered is a quantity of interest of a parametric Differential Equation with lognormal diffusion coefficient. For more details, see Figure 1 of our paper (with experiments in dimension up to 32).

New preprint: a coherence parameter characterizing generative compressed sensing with Fourier measurements

July 28, 2022

The framework of generative compressed sensing proposed by Bora et al. in 2017 had a significant impact and has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. In our new preprint

https://arxiv.org/abs/2207.09340 

Aaron Berk (McGill), Babhru Joshi, Yaniv Plan, Matthew Scott, Özgür Yilmaz (all based at UBC), and I move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (e.g. subsampled Fourier matrices). The "keystone" of our theoretical analysis is the coherence, a new parameter, which measures the interplay between the range of the generative network and the measurement matrix. In addition, we propose a regularization strategy for training generative neural networks to have favourable coherence with the measurement operator and provide compelling numerical simulations that support this regularized training strategy.

The coherence effectively characterizes generative compressed sensing recovery. Recovery comparison of MNIST images for various measurement sizes (denoted by column heading) for a low coherence network (α = 0.82), high-coherence network (α = 0.96) and network with final sigmoid activation (Sig). The leftmost column, signal, corresponds to the target image. For further details, see Figure 2 of our preprint. Numerics by Babhru Joshi and Matthew Scott.

Compressive Fourier collocation for high-dimensional PDEs

June 7, 2022

Another exciting preprint prepared in collaboration  with Weiqi Wang (Concordia) is on the arXiv!

https://arxiv.org/abs/2206.01255 

We propose a new method for the numerical solution of high-dimensional PDEs with periodic boundary conditions called compressive Fourier collocation. Combining techniques from sparse high-dimensional approximation and Monte Carlo sampling, our method requires a number of collocation points that scales mildly (i.e., logarithmically) with respect to the dimension of the PDE domain. Focusing on the case of high-dimensional diffusion equations, we provide recovery guarantees based on the recently proposed framework of sparse recovery in subsampled bounded Riesz systems and carry out numerical experiments up to dimension 20.

Sparse and compressible PDE solutions. We use these two synthetic exact solutions (respectively sparse (left) and compressible (right) with respect to the Fourier basis) to validate the compressive Fourier collocation method. Numerics by Weiqi Wang.

LASSO reloaded

May 17, 2022

I am quite excited to announce a new arXiv preprint written in collaboration with Aaron Berk (McGill) and Tim Hoheisel (McGill):

https://arxiv.org/abs/2205.06872 

We provide a variational analysis of the unconstrained formulation of the LASSO problem, ubiquitous in statistical learning, signal processing, and inverse problems. In particular, we establish smoothness results for the optimal value as well as Lipschitz properties of the optimal solution as functions of the measurement vector and the tuning parameter. Moreover, we show how to apply the proposed variational analysis in the context of compressed sensing with subgaussian measurements. Our theoretical findings are validated by numerical experiments.

Lipschitzness of the LASSO solution map with respect to the tuning parameter. The plot shows the distance between solutions of the LASSO problem as a function of the tuning parameter (red curve), the upper bound given by our theory (blue curve), and the ratio between the two curves (purple curve and left y-axis). For more details, see Figure 1 of our paper. Numerics by Aaron Berk.

Interview for Concordia News: How to make AI ‘more intelligent’

April 5, 2022

I had the pleasure to be interviewed by Elizabeth Faure for Concordia News. In the interview, I talk about the importance of developing new theoretical guarantees for deep learning in order to deploy safer and more reliable AI technologies. In particular, I present my recent work in collaboration with  Ben Adcock, Nick Dexter, and Sebastian Moraga on practical existence theorems for deep neural networks to appear in the Proceedings of Machine Learning Research

You ca read the interview at this link:

https://www.concordia.ca/news/stories/2022/04/01/concordia-researcher-explores-how-to-make-ai-more-intelligent.html 

I would also like to thank Taylor Tower for her assistance through the process.

My interview on Concordia's home page (concordia.ca)

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

March 31, 2022

A new preprint in collaboration with Ben Adcock, Nick Dexter, and Sebastian Moraga is out! 

Our paper addresses the following question:  are there robust, efficient algorithms for computing approximations to finite- or infinite-dimensional, holomorphic and Hilbert-valued functions from limited samples that achieve best s-term rates?

We answer this in the affirmative by introducing algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic, and physical discretization errors. Our work involves several significant developments of existing techniques, including a novel restarted primal-dual iteration for solving weighted l1-minimization problems in Hilbert spaces. Our theory is supplemented by numerical experiments demonstrating the practical efficacy of these algorithms. 

Check out our preprint on the arXiv! https://arxiv.org/abs/2203.13908  

The benefits of the restarted primal dual iteration in high-dimensional function approximation. We compare the convergence of the primal dual iteration ("PD", dashed line) and various versions of the restarted primal dual iteration ("PDR", solid colored lines). We also show the error curve predicted by the theory for the restared primal dual iteration (dotted line). For further details, see Figure 1 of our paper. Numerics by Nick Dexter and Sebastian Moraga.

Book officially published

March 5, 2022

I am proud to announce that the book "Sparse Polynomial Approximation of High-dimensional Functions" by Ben Adcock, Clayton Webster, and me has now been officially published by SIAM. You can

Enjoy! And if you have any thoughts on the book you'd like to share with me, do not hesitate to write me an email.

The book on SIAM News (top right corner).

Year > 2022-23202120202019