Research Interests

  • numerical analysis

  • data science

  • machine learning

  • uncertainty quantification

  • compressed sensing

  • high-dimensional approximation theory

  • optimization

  • inverse problems

  • applied & computational harmonic analysis

  • computational epidemiology and genomics

Group testing for COVID-19

keywords: group testing problems, machine learning, optimization, compressed sensing

In recent work with Hooman Zabeti (Simon Fraser University), Ivan Lau (Simon Fraser University), Leonhardt Unruh (Imperial College London), Ben Adcock (Simon Fraser University), and Leonid Chindelevitch (Imperial College London), I have been working to develop a python software package for testing sparse-recovery strategies for SARS-CoV-2 detection in large populations via group testing . The group testing problem seeks to identify infected individuals from large populations by pooling samples and testing the resulting groups. Utilizing sparse recovery techniques from compressed sensing and integer linear programming, we can identify the infected individuals from large populations with high accuracy at moderate prevalence and noise levels.

Moreover, in a recent collaboration with M. Storch and M. Crone (Department of Infectious Disease, Imperial College London), we are investigating the development of robotic testing schemes which can further improve group testing efficiency. Application of these techniques in practical testing scenarios can lead to a dramatic reduction in the total number of tests to be processed and faster processing times in identifying infections.

The paper is available here!

Figure: Pooling graphs specifying testing for 48 individuals in 8 groups with increasing group sizes (left to right) and correspondingly increasing average number of tests per individual.

Figure: Average balanced accuracy over 10 trials at recovering the infected status of individuals in a population of size 1000 at (left) 5% and (right) 10% prevalence.

Figure: Approximations of the solution of a parametric elliptic PDE (left) early in the training process (after 2 epochs of Adam), (middle) at the end of training (2045 epochs), and (right) the reference solution.

Figure: Approximation of the solution of a parametric diffusion PDE with mixed boundary conditions by a DNN with hyperbolic tangent activation function. The DNNs are trained using samples of the solution of the mixed formulation obtained with a combination of standard Lagrange P1 and Raviart-Thomas elements.

Figure: Comparison of (left) global approximation error and (right) average time to obtain an approximation between a best-in-class scheme based on joint-sparse recovery, polynomial approximation, and finite elements and DNN architectures with a variety of activation functions, depths, and widths.

Learning high-dimensional Hilbert-valued functions from limited data

keywords: machine learning, optimization, compressed sensing, approximation theory, numerical analysis, high dimensional parameterized PDEs and UQ, applied functional analysis

Accurate approximation of scalar-valued functions from sample points is a key task in computational science. Recently, machine learning with Deep Neural Networks (DNNs) has emerged as a promising tool for scientific computing, with impressive results achieved on problems where the dimension of the data or problem domain is large. In joint work with Ben Adcock (Simon Fraser University), Simone Brugiapaglia (Concordia University), and Sebastián Moraga (Simon Fraser University) we seek to broaden this perspective, focusing on approximating functions that are Hilbert-valued, i.e. take values in a separable, but typically infinite-dimensional, Hilbert space. This arises in science and engineering problems, in particular those involving solution of parametric Partial Differential Equations (PDEs).

Our contributions are twofold:

  1. First, we present a novel result on DNN training for holomorphic functions with so-called hidden anisotropy. This result introduces a DNN training procedure and full theoretical analysis with explicit guarantees on error and sample complexity. The error bound is explicit in three key errors occurring in the approximation procedure: the best approximation, measurement, and physical discretization errors. Our result shows that there exists a procedure (albeit non-standard) for learning Hilbert-valued functions via DNNs that performs as well as, but no better than current best-in-class schemes. It gives a benchmark lower bound for how well DNNs can perform on such problems.

  2. Second, we examine whether better performance can be achieved in practice through different types of architectures and training. We provide preliminary numerical results illustrating practical performance of DNNs on parametric PDEs. We consider different parameters, modifying the DNN architecture to achieve better and competitive results, comparing these to current best-in-class schemes.

The paper is available here!

TV minimization in compressive imaging: recovery guarantees and sampling strategies

keywords: inverse problems, compressed sensing, machine learning, optimization, numerical analysis, applied & computational harmonic analysis

In recent work with Ben Adcock (Simon Fraser University) and Qinghong (Jackie) Xu (Simon Fraser University) we derive uniform recovery guarantees asserting stable and robust recovery for arbitrary random sampling strategies in Fourier and binary imaging modalities, thereby establishing the first results of this kind in the binary case. These results improve on the current state-of-the-art by extending analysis to the case of d-dimensional images for arbitrary d ≥ 1, and improve upon measurement conditions for Fourier imaging in the case d = 2. Furthermore, using these estimates we are able to derive a class of theoretically optimal sampling strategies, and establish near optimality of a sampling pattern inspired by the hyperbolic cross. We also conduct detailed numerical experiments on a series of 2-dimensional test images, as well as on a task reconstructing 3D MRI data from samples over a range of sub-sampling percentages.

The preprint of the paper is available here!


Figure: Sample efficiency illustration of TV minimization and a multilevel sampling pattern at recovering a 3D MRI scan of a human knee over a range of sub-sampling percentages.


Figure: The results of 20 trials training ReLU DNNs with 5 hidden layers and 30 nodes per hidden layer on uniform random sample data from the indicator of a square subdomain centered at the origin of [-1,1]^2. (Left) the point-wise absolute difference and (right) the point-wise outputs of the trained DNNs.

Figure: Example of training a ReLU DNN with 20 hidden layers and 200 nodes per hidden layer on function data from f(x) = log(sin(100x)+2) + sin(10x). (Top-left) the true function plotted with the DNN approximation, (top-right) the DNN approximation, (bottom-left) the pointwise absolute error, and (bottom-right) the training loss (MSE) vs. epochs of Adam.

The gap between theory and practice in function approximation with deep neural networks

keywords: machine learning, optimization, compressed sensing, approximation theory, numerical analysis, applied & computational harmonic analysis

Deep learning (DL) is transforming whole industries as complicated decision making-processes are being automated by neural networks trained on real-world data. Yet as these tools are increasingly being applied to critical problems in medicine, science, and engineering, from the standpoint of numerical analysis many important questions about their stability, reliability, and approximation capabilities remain. Such questions include:

  1. How many data points are sufficient to train a neural network on simple approximation tasks?

  2. How robust are these trained architectures to noise in the measurements?

Recently published theoretical results on approximation with DNNs show that these architectures allow for the same convergence rates as best-in-class schemes, including h, p-adaptive finite element approximations. In recent work with Ben Adcock (Simon Fraser University), we seek to quantify the approximation capabilities of DNNs both theoretically and practically. Our main contributions are:

  • We develop a computational framework for examining the practical capabilities of DNNs in scientific computing, based on the rigorous testing principles of numerical analysis. We provide clear practical guidance on training DNNs for function approximation problems.

  • We conduct perhaps the first comprehensive empirical study of the performance of training fully-connected feedforward ReLU DNNs on standard function classes considered in numerical analysis, namely, (piecewise) smooth functions on bounded domains. We compare performance over a range of dimensions, examining the capability of DL for mitigating the curse of dimensionality. We examine the effect of network architecture (depth and width) on both ease of training and approximation performance of the trained DNNs. We also make a clear empirical comparison between DL and current best-in-class approximation schemes for smooth function approximation. The latter is based on polynomial approximation via compressed sensing (CS), which (as we also show) achieves exponential rates of convergence for analytic functions in arbitrarily-many dimensions.

  • We present theoretical analysis that compares the performance of DL to that of CS for smooth function approximation. In particular, we establish a novel practical existence theorem, which asserts the existence of a DNN architecture and training procedure (based on minimizing a certain cost function) that attains the same exponential rate of convergence as CS for analytic function approximation with the same sample complexity.

The paper was recently accepted to the SIAM Journal on Mathematics of Data Science (SIMODS) and an arXiv preprint is available here!

Code for the paper is available here!

Simultaneous compressed sensing for the solution of high-dimensional parametric partial differential equations (PDEs)

keywords: compressed sensing and joint-sparse approximations, optimization, approximation theory, numerical analysis, applied & computational harmonic analysis

Standard CS theory was developed to recover real or complex sparse vectors, and is therefore only capable of reconstructing functionals of solutions to parameterized PDEs. One can use the point-wise evaluation functional to reconstruct fully-discrete approximations to such solutions by sampling the physical domain, although this possesses its own challenges: what error estimates can be shown for this approach, and how can one obtain accurate point-wise tail bounds to parameterize the solvers? A related problem, called joint-sparse (JS) recovery, considers the simultaneous reconstruction of a set of sparse vectors sharing common support, and has been applied to problems in digital communication, compressive imaging with TV minimization, hyperspectral imaging, and sensor networks.

Inspired by the JS recovery problem and finite element and spectral discretizations of parameterized PDEs, in collaboration with Hoang Tran (Computer Science and Mathematics Division, Oak Ridge National Laboratory) and Clayton Webster (University of Texas at Austin) we proposed the first comprehensive implementation and analysis of a framework for CS-based sparse recovery techniques for fully-discrete approximation of solutions to parameterized PDEs named simultaneous compressed sensing (SCS). The SCS method integrates advanced techniques from CS and approximation theory, such as random sampling from bounded orthonormal systems, as well as functional analysis and optimization in proving convergence results for the algorithms in the infinite dimensional setting. The key to this framework lies in considering a version of the JS recovery problem where the set of vectors to recover is infinite, and in another collaboration with Hoang Tran and Clayton Webster we were able to show the strong convergence of a forward-backward splitting method derived for this setting and establish the equivalence between solving the SCS problem and obtaining sparse polynomial approximations to parameterized PDEs, from which exponential rates of convergence follow. We also conducted numerical experiments approximating the solution of a parameterized elliptic PDE where error comparable to that of stochastic collocation is achieved with dramatically fewer sample points.

More details can be found in my PhD thesis. The journal paper on the SCS method was published in ESAIM: Mathematical Modelling and Numerical Analysis (ESAIM: M2AN) and is available here!

Figure: (Left) comparison of relative error statistics for the SCS method (SCS-TD) and compressed sensing point-wise functional recovery (PCS-TD) methods, both computed with a total degree basis of order p = 2 in d = 100 dimensions having cardinality N = 5151, (center) magnitudes of the coefficients in (top) energy norm and (bottom) pointwise at three physical locations xi sorted lexicographically, (right) same as center after sorting by largest in magnitude. Here c^{h,⋆} is obtained with the stochastic Galerkin method in the same total degree polynomial subspace.

Figure: Comparison of stochastic collocation with Clenshaw-Curtis points (SC-CC), Monte Carlo (MC), and simultaneous compressed sensing recovery in the total degree subspace (SCS-TD) in relative error in the Hilbert space norm of the (left) expectation and (right) standard deviation, w.r.t. stochastic degrees of freedom (# of samples) in d = 17 dimensions.

Figure: Errors obtained with compressed sensing-based polynomial approximations solving weighted and unweighted ℓ1 minimization problems given data from (left) a smooth composition of exponential and cosine functions in d=16 dimensions, (middle) a composition of cosine and rational functions in d=16 dimensions, and (right) a high-degree rational function in d=8 dimensions.

Figure: Illustration of the Legendre expansion coefficients of a smooth composition of exponential and cosine functions in d=8 dimensions (left) sorted lexicographically, (middle) sorted by decreasing magnitude, and (right) the approximation errors for the CS-based polynomial approximation approach in d=8 dimensions, and with a variety of choices of weights for the weighted ℓ1 minimization problem.

Polynomial approximation via compressed sensing of high-dimensional functions on lower sets

keywords: compressed sensing, approximation theory, numerical analysis, applied & computational harmonic analysis, optimization

Motivated by the fact that solutions to many problems in uncertainty quantification (UQ) are often smooth, and hence much work is wasted in computing negligible coefficients of their expansions, in work with Abdellah Chkifa (Department of Mathematics, Mohamed VI Polytechnic University), Hoang Tran (Computer Science and Mathematics Division, Oak Ridge National Laboratory), and Clayton Webster (University of Texas at Austin) we proposed the solution of a weighted ℓ1 minimization problem, with a specific choice of weights designed to enhance smooth function recovery. Furthermore, through a combination of:

  1. extension of chaining arguments for unitary matrices to general bounded orthonormal systems,

  2. introduction of the lower restricted isometry property (RIP) (a specific case of weighted RIP motivated by lower sets and smooth function approximation),

  3. approximation on the hyperbolic cross multi-index set,

we improved the measurement condition guaranteeing stable and robust recovery of sparse Chebyshev and Legendre approximations to such functions with high probability. These results represent the current state-of-the-art in approximating high-dimensional smooth functions.

The paper was published in AMS: Mathematics of Computation and is available here!

Explicit cost bounds of stochastic Galerkin approximations for parameterized PDEs with random coefficients

keywords: numerical analysis, approximation theory, high-dimensional parameterized PDE problems & uncertainty quantification, applied functional analysis

Modern approaches to UQ problems include a wide variety of methods to approximate solutions to high-dimensional parameterized PDEs. Of these, methods based on polynomials are the most widely used, due to their ability to exploit the solution’s regularity with respect to the parameterization in obtaining fast convergence rates. However, a key challenge for these methods arises from the exponential relationship between computational effort required to find solutions and the dimension of the problem, i.e, the so-called curse of dimensionality. My first work in this area with Clayton Webster (University of Texas at Austin) and Guannan Zhang (Computer Science and Mathematics Division, Oak Ridge National Laboratory) studied the relationship between accuracy, dimension, and computational complexity for stochastic Galerkin discretizations of these problems based on Legendre polynomials, establishing explicit cost bounds for such approximations under arbitrary parameterizations of elliptic PDEs.

The paper was published in Computers & Mathematics with Applications and is available here!

Figure: Visualization of (left) a triangulation of the physical domain D with circular and square subdomain inclusions (red nodes highlight the boundary of an inclusion or the domain, blue nodes highlight nodes on the interior of an inclusion) and (right) the expected value of the solution of with a common example in engineering and the physical sciences of isotropic thermal diffusion problem with a stochastic conductivity coefficient that varies on the circular subdomains.

Figure: Visualization of the sparsity of the Kronecker product stochastic Galerkin system for a discretized parametric elliptic PDE. Each pixel represents a block finite element system when using a total degree projection of the solution of fixed degree p = 3, and increasing the overall order of the projection of the data (left to right) in modeling a nonlinear coefficient of the physical and parametric variables.

The cover photo was taken at Jetée d'Osches in Lausanne, Switzerland, the sculpture is Ouverture au monde.