Here is a brief summary of the topics I have been working on.
Low-rank approximation of matrices
While the truncation of the Singular Value Decomposition (SVD) gives the best possible rank-k approximation in the spectral norm and in the Frobenius norm, in some applications it is useful to find a low-rank approximation of a matrix A using some of its rows and columns. For example, cross approximation uses k rows and k columns (chosen carefully) to form a low-rank approximation of A (as in the picture).
We provide error bounds for the adaptive cross approximation algorithm, which is a greedy algorithm that aims at choosing "nice" rows and columns for such cross approximation, here.
A related problem is column subset selection, which aims at selecting k columns of a matrix A that well approximate the range of A. In this paper, we improve a (deterministic) algorithm that achieves a quasi-optimal selection of columns and we extended this technique to cross approximation.
In this paper, we analyze an algorithm that selects columns and rows using random sampling and the strong rank-revealing QR factorization, and we show that with suitable sparsity and/or incoherence assumptions on the factors of a low-rank decomposition of the matrix A, it is sufficient to sample a sub-linear number of columns and rows.
In this paper, we propose and analyze another randomized algorithm for column subset selection, which applies an adaptive leverage score sampling strategy. In expectation, the low-rank approximation error in the Frobenius norm matches the best possible existence result!
Functions of matrices
When a matrix A has some low-rank structure, for example when it is banded or more generally its off-diagonal blocks have low rank, the matrix function f(A) often approximately preserves such structure. We proposed divide-and-conquer methods to compute matrix functions of such matrices here. These algorithms are based on the fact that if a matrix A is changed into A + R for a low-rank matrix R then the "update" f(A+R) - f(A) is often numerically low-rank and can be approximated efficiently with a Krylov subspace projection method (see here and here).
The computation of f(A)b for a large matrix A of size n x n and a vector b of length n can be addressed by Krylov subspace methods (without computing the large matrix f(A)!). One needs to form a basis V of the Krylov subspace of dimension m and then needs to evaluate the function of the m x m matrix corresponding to the "compression" of A on the Krylov subspace. Constructing an orthonormal basis of the Krylov subspace using the Arnoldi method, however, becomes expensive when m is relatively large. In a recent work with Daniel Kressner and Yuji Nakatsukasa (that I started during my visit to Yuji Nakatsukasa at the University of Oxford), we explore how we can use other (faster) algorithms for building bases of the same Krylov subspace, and still get an accurate approximation of f(A)b. We leverage randomized algorithms in two ways: for computing the compression of A onto the Krylov subspace (this requires the solution of an overdetermined least-squares problem) and for computing a basis of the Krylov subspace. Speeding up the standard Arnoldi method is possible!
Free convolution: What is the eigenvalue distribution of the sum of random matrices?
Free probability theory has a deep connection with random matrix theory, as it explains the behavior of spectra of matrices that are sum / products of random matrices, when the matrix dimensions grow to infinity. In a recent work with Lexing Ying, we develop a numerical method for approximately computing the free convolution of measures that result in a measure that has a density with a "square-root behavior" at the boundary of the support. Our algorithm is based on using the Cauchy integral formula to express the Cauchy, R, S, and T transforms, and its discretization using the trapezoidal quadrature rule. Analyticity of such transforms imply that the quadrature rule converges exponentially fast!
Example: Free additive convolution of semicircle distribution (mu_1) and Marchenko-Pastur distribution with parameter 0.5 (mu_2). In matrix terms: look at histograms of eigenvalues of A + B, where A is a symmetric random matrices of N(0,1) entries, and B = CC^T where C is a matrix of iid N(0,1) entries of size 2n x n.
Randomized trace estimation
The Hutchinson's trace estimator is a randomized method to estimate the trace of a matrix by averaging some quadratic forms involving A and some samples of a random vector X. Some (interesting) quantities can be expressed as the trace of a matrix function, for example the determinant of a symmetric positive definite matrix can be expressed as det(A) = exp(trace(log(A))). In this paper we proved tail bounds for such estimator when either Gaussian or Rademacher random vectors are used, with a particular focus on the computation of the determinant. The Hutch++ algorithm combines Hutchinson's trace estimator with a randomized low-rank approximation of A in order to speed up randomized trace estimation. In this paper we devised an improved version of Hutch++ that gives a trace approximation with accuracy specified by the user, while trying to minimize the computational cost.