# e-NLA

Online seminar series on Numerical Linear Algebra

E-NLA is an online seminar series dedicated to topics in Numerical Linear Algebra. Talks take place on Wednesdays at 4pm CEST (Central European Summer Time) and are scheduled on a fortnightly basis.

To join the seminar, please complete the sign up form at the bottom of this page. Information about how to connect to the Zoom conference call will be circulated via email to all registered attendees prior to each seminar.

Scientific committee:

Melina Freitag (University of Potsdam), Stefan Güttel (University of Manchester), Daniel Kressner (EPF Lausanne), Jörg Liesen (TU Berlin), Valeria Simoncini (University of Bologna), Alex Townsend (Cornell University), and Bart Vandereycken (University of Geneva)

This seminar series is endorsed by GAMM, SIAM, and ILAS.

# Spring 2021 talks

### February 24 - Erin Carson - What do we know about block Gram-Schmidt?

Speaker affiliation: Charles University

Abstract: Block Gram-Schmidt algorithms are used in a variety of important applications in scientific computing, including block and communication-avoiding variants of Krylov subspace methods. Block variants can have a large advantage from a performance standpoint, but this advantage does come at a potential cost: in some cases, the finite precision behavior of block variants is surprisingly worse from that of their single-vector counterparts. For many commonly used block variants, a rigorous treatment of their stability properties in finite precision remains open.

We present a comprehensive categorization of block Gram-Schmidt algorithms via a "skeleton and muscle" analogy, with skeleton and muscle referring to the inter- and intrablock orthogonalization routines, respectively. We assemble known stability results from the literature and point out the major remaining gaps in our knowledge. We present recent results on the stability of block variants of classical Gram-Schmidt and discuss the bigger question of the connection between loss of orthogonality and backward stability in GMRES.

### March 10 - Gunnar Martinsson - Randomized algorithms for pivoting and for computing interpolatory and CUR decompositions

Speaker affiliation: UT Austin

Abstract: The talk will survey a set of randomized algorithms for selecting a subset of the columns of a matrix whose spanning volume is close to maximal. This task is the core challenge when designing pivoting strategies for efficiently computing rank revealing QR and LU decompositions. It also arises in the context of computing low rank approximations to matrices (in particular when a CUR or interpolatory decomposition is desired), and in selecting "matrix skeletons" in fast algorithms for handling large dense matrices that arise in numerical methods for elliptic PDEs.

The main takeaway of the talk is that the column selection problem becomes computationally tractable if we first extract a small random "sketch" of the row space of the matrix. To be precise, the sketch we need is a collection of random linear combinations of the rows of the original matrix. We will describe how to efficiently extract such a sketch, and then how to best use it to find a set of good spanning columns.

The talk is intended as an introduction to randomized column selection; the focus is on introducing practical and simple algorithms, rather than theoretical analysis.

### March 24 - Cameron Musco - Hutch++: Optimal Stochastic Trace Estimation

Speaker affiliation: University of Massachusetts Amherst

Abstract: We study the problem of estimating the trace of a matrix that can only be accessed through matrix-vector multiplication. We introduce a new randomized algorithm, Hutch++, which computes a (1+epsilon) approximation to the trace of any positive semidefinite (PSD) matrix using just O(1/epsilon) matrix-vector products. This improves quadratically on the ubiquitous Hutchinson's estimator, which requires O(1/epsilon^2) matrix-vector products.

Our approach is based on a simple technique for reducing the variance of Hutchinson's estimator using low-rank approximation, and is easy to implement and analyze. Moreover, we prove that up to a logarithmic factor, the complexity of Hutch++ is optimal amongst all matrix-vector query algorithms. We show that it significantly outperforms Hutchinson's method in experiments, on both PSD and non-PSD matrices.

In the talk, I will also discuss a broader research program of finding asymptotically optimal algorithms for basic linear algebra problems in the matrix-vector query model of computation. Beyond our work on trace estimation, there is exciting recent progress on linear systems and eigenvalue approximation. I will briefly discuss remaining open questions and interesting connections to work in theoretical computer science, including on communication complexity and fine-grained complexity theory.

This work is joint with Raphael A Meyer, Christopher Musco, and David P Woodruff. The associated paper appeared in the SIAM Symposium on Simplicity in Algorithms (SOSA 2021) and can be found here: https://arxiv.org/abs/2010.09649.

### April 7 - Nikhil Srivastava - Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time

Speaker affiliation: University of California, Berkeley

Abstract: We give a randomized algorithm which provably diagonalizes an n x n complex matrix up to backward error delta using O(log^2(n/\delta)) matrix multiplications in finite arithmetic with O(polylog(n/delta)) bits of precision, substantially improving the previously known provable complexity bound for this problem. The algorithm is a variant of the spectral bisection algorithm of Beavers and Denman from the 70's. The two key ingredients are (1) A random matrix theory result showing that every matrix is close to one with well-conditioned eigenvalues and eigenvectors, resolving a conjecture of E.B. Davies. (2) A rigorous analysis of Roberts' Newton iteration method for computing the matrix sign function in finite arithmetic, itself an open problem in numerical analysis since the 80's. Joint work with J. Banks, J. Garza-Vargas, A. Kulkarni, and S. Mukherjee.

### April 21 - Serkan Gugercin - Rational approximations and data-driven modeling for dynamical systems

Speaker affiliation: Virginia Tech

Abstract: Rational approximants are fundamental tools for model reduction and data-driven modeling of dynamical systems. In the model reduction setting, one assumes access to underlying full model dynamics (in the form of a state-space realization) and constructs a reduced model (effectively, a low-order rational approximant) by projecting the dynamics onto a reduced-dimension subspace. In the context of data-driven modeling on the other hand, one only has access to input/output response data and must learn a reduced model directly from this data. In this talk, we will first review how this enterprise leads to the Cauchy and Loewner matrices, Sylvester and Lyapunov equations, and barycentric forms that are ubiquitous in systems theoretic approaches to model reduction, and then we'll discuss some recent developments that tie them all together.

### May 5 - Madeleine Udell - Scalable Semidefinite Programming

Speaker affiliation: Cornell University

Abstract: Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This talk develops new provably correct algorithms for solving large SDP problems by economizing on both the storage and the arithmetic costs. We present two methods: one based on sketching, and the other on complementarity. Numerical evidence shows that these methods are effective for a range of applications, including relaxations of MaxCut, abstract phase retrieval, and quadratic assignment, and can handle SDP instances where the matrix variable has over 10^13 entries.

The first method modifies a standard primal-dual convex method - the conditional-gradient augmented Lagrangian method - to store (and work with) a sketched version of the primal. A low rank approximation to the primal solution can be recovered from this sketch.

The second method begins with a new approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the null space of the dual slack matrix. For weakly constrained SDPs, this null space has very low dimension, so this observation significantly reduces the search space for the primal solution.

This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) Solve the dual SDP approximately (with any first-order solver); (2) compress the primal SDP to the null space of the dual slack matrix; (3) solve the compressed primal SDP.

# Past talks

### April 29, 2020 - Nicholas J. Higham - Are Numerical Linear Algebra Algorithms Accurate at Extreme Scale and at Low Precisions?

Speaker affiliation: University of Manchester

Abstract: The advent of exascale computing will bring the capability to solve dense linear systems of order $10^8$. At the same time, computer hardware is increasingly supporting low precision floating-point arithmetics, such as the IEEE half precision and bfloat16 arithmetics. The standard rounding error bound for the inner product of two $n$-vectors $x$ and $y$ is $|fl(x^Ty) - x^Ty| \le n u |x|^T|y|$, where $u$ is the unit roundoff, and the bound is approximately attainable. This bound provides useful information only if $nu < 1$. Yet $nu > 1$ for exascale-size problems solved in single precision and also for problems of order $n > 2048$ solved in half precision. Standard error bounds for matrix multiplication, LU factorization, and so on, are equally uninformative in these situations. Yet the supercomputers in the TOP500 are there by virtue of having successfully solved linear systems of orders up to $10^7$, and deep learning implementations routinely use half precision with apparent success.

Have we reached the point where our techniques for analyzing rounding errors, honed over 70 years of digital computation, are unable to predict the accuracy of numerical linear algebra computations that are now routine? I will show that the answer is "no": we can understand the behaviour of extreme-scale and low accuracy computations. The explanation lies in algorithmic design techniques (both new and old) that help to reduce error growth along with a new probabilistic approach to rounding error analysis.

Recorded talk: https://youtu.be/B1eFGn5nN84

### May 6, 2020 - Michele Benzi - Nonlocal dynamics on networks via fractional graph Laplacians: theory and numerical methods

Speaker affiliation: Scuola Normale Superiore Pisa

Abstract: Nonlocal diffusive dynamics on large, sparse networks can be modeled by means of systems of differential equations involving fractional graph Laplacians. The solution of such systems leads to non-analytic matrix functions, due to the singularity of the graph Laplacian. Off-diagonal decay estimates for these and related matrix functions will be presented, together with explicit (closed form) expressions for some simple but important examples. The case of directed networks (leading to nonsymmetric Laplacians) will also be discussed.

The numerical approximation of the dynamics can be implemented by means of Krylov subspace methods. The lack of smoothness of the underlying function suggests the use of rational approximation techniques. Some results using a shift-and-invert approach will be presented.

Applications include the efficient exploration of large spatial networks and consensus dynamics in multi-agent systems.

This is joint work with Daniele Bertaccini (U. of Rome `Tor Vergata’), Fabio Durastante (IAC-CNR), and Igor Simunec (Scuola Normale Superiore).

### May 13 - Volker Mehrmann - Robustness of Linear Algebra Properties for Port-Hamiltonian Systems

Speaker affiliation: Technische Universität Berlin

Abstract: Port-Hamiltonian systems are an important class of control systems that arise in all areas of science and engineering. When the system is linearized around a stationary solution one gets a linear port-Hamiltonian system. Despite the fact that the system looks unstructured at first sight, it has remarkable properties. Stability and passivity are automatic, spectral structures for purely imaginary eigenvalues, eigenvalues at infinity, and even singular blocks in the Kronecker canonical form are very restricted and furthermore the structure leads to fast and efficient iterative solution methods for associated linear systems. When port-Hamiltonian systems are subject to (structured) perturbations, then it is important to determine the minimal allowed perturbations so that these properties are not preserved. The computation of these structured distances to instability, non-passivity, or non-regularity, is typically a very hard optimization problem. However, in the context of port-Hamiltonian systems, the computation becomes much easier and can even be implemented efficiently for large scale problems in combination with model reduction techniques. We will discuss these distances and the computational methods and illustrate the results via an industrial problem in the context of noise reduction for disk brakes.

### May 20 - Ilse Ipsen - Probabilistic Numerical Linear Solvers

Speaker affiliation: North Carolina State University

Abstract: We formulate iterative methods for the solution of nonsingular linear systems as statistical inference processes by modeling the epistemic uncertainty in the iterates due to a limited computational budget. The goal is to obtain well-calibrated uncertainty that is more insightful than traditional worst-case bounds, and to produce a probabilistic description of the error that can be propagated coherently through a computational pipeline.

Our Bayesian Conjugate Gradient Method (BayesCG) for real symmetric positive-definite linear systems posits a prior distribution for the solution, and conditions on the finite amount of information obtained during the iterations to produce a posterior distribution that reflects the reduced uncertainty. The following topics will be addressed: (i) choice of prior for fast convergence and well-calibrated uncertainty; (ii) error estimation through test statistics that mitigate the effect of BayesCG's nonlinear dependence on the solution; and (iii) numerical stability to maintain positive semi-definiteness of the posteriors, and prevent convergence slow down from loss of orthogonality in residuals and search directions.

This is joint work with Jon Cockayne (http://www.joncockayne.com/), Chris J. Oates (http://oates.work/), and Timothy W. Reid (https://math.sciences.ncsu.edu/people/twreid/).

### May 27 - Cleve Moler - The Evolution of "The Evolution of MATLAB"

Speaker affiliation: MathWorks, Inc.

Abstract: For years I have been giving a talk about how MATLAB has evolved from a simple matrix calculator to a powerful technical computing environment. The talk itself has evolved. In today's talk I plan to spend less time on history and more on contemporary activities. I will describe several examples of interesting scientific applications. I hope to conclude with demonstrations of my current adventures in machine learning and self-synchronizing oscillators.

### October 14 - Sherry Li - Autotuning exascale applications with Gaussian process regression

Speaker affiliation: Lawrence Berkeley National Laboratory

Abstract: Significant effort has been invested to develop highly scalable numerical libraries and high-fidelity modeling and simulation for the upcoming exascale computers. These codes typically involve many parameters which need to be selected properly to optimize performance on the underlying parallel machine. They are also expensive to run and thus have limited "function evaluation" values, which post significant challenges to efficient performance tuning on diverse architectures.

Bayesian optimization with Gaussian process regression is an attractive machine learning framework to build surrogate models with limited function evaluation points. In order to fully utilize all the available data, we leverage multitask learning and multi-armed bandit strategies to build a more advanced Bayesian optimization framework.

We have developed an open-source software tool, called GPTune, for optimizing expensive large-scale HPC codes. We will show several features of GPTune, e.g., incorporation of coarse performance models to improve the Bayesian model, multi-objective tuning such as tuning a hybrid of time, memory and accuracy, and reuse of historical data base for model portability.

We will demonstrate the efficiency and effectiveness of GPTune when it is applied to numerical linear algebra libraries, such as ScaLAPACK, SuperLU and Hypre, as well as fusion simulation codes M3D-C1 and NIMROD.

This talk describes joint work with James Demmel, Yang Liu, Osni Marques, Wissam Sid-Lakhdar and Xianran Zhu

### June 3 - Nick Trefethen - Vandermonde with Arnoldi

Speaker affiliation: University of Oxford

Abstract: Vandermonde matrices are exponentially ill-conditioned, rendering the familiar “polyval(polyfit)” algorithm for polynomial interpolation and least-squares fitting ineffective at higher degrees. We show that Arnoldi orthogonalization fixes the problem.

It's remarkable how widely this trick is applicable. Half a dozen examples will be presented.

This is joint work with Pablo Brubeck and Yuji Nakatsukasa.

### June 10 - Martin Gander - A Linear Algebra Approach to Time Parallelization: Parareal, ParaExp, ParaDiag, ParaOpt and ParaStieltjes

Speaker affiliation: University of Geneva

Abstract: Time parallelization has been a very active research area over the past decade. This is due to so massively parallel computer architectures that parallelization in the spatial direction rarely suffices to take full advantage of such systems when solving evolution problems. Time parallelization is however quite different from spatial parallelization, since information only propagates forward in time, never backward. Time parallelization algorithms are often derived at the PDE level, but whenever they are used, they take the form of solvers for linear algebra problems. I will give in my presentation an introduction to such algorithms at the linear algebra level, starting with two simple but typical model problems, namely a heat equation and a transport equation. At the linear algebra level, these two problems look deceivingly similar, but time parallel algorithms need different features when solving one or the other in parallel. I will explain the reason for this at the linear algebra level, and then show how Parareal, ParaExp and ParaDiag address them. If time permits, I will also briefly explain the newer classes of ParaOpt and ParaStieltjes algorithms.

### June 17 - Mark Embree - Contour Integral Methods for Nonlinear Eigenvalue Problems: A Systems Theory Perspective

Speaker affiliation: Virginia Tech

Abstract: Contour integral methods for nonlinear eigenvalue problems seek to compute a subset of the spectrum in a bounded region of the complex plane. We briefly survey this class of algorithms, establishing a relationship to system realization techniques in control theory. This connection motivates new contour integral methods that build on recent developments in rational interpolation of dynamical systems. The resulting techniques, which replace the usual Hankel matrices with Loewner matrix pencils, incorporate general interpolation schemes and permit ready recovery of eigenvectors. Numerical examples illustrate the potential of this approach.

This talk describes joint work with Michael Brennan (MIT) and Serkan Gugercin (Virginia Tech).

### June 24 - James Demmel - Communication-Avoiding Algorithms for Linear Algebra, Machine Learning, and Beyond

Speaker affiliation: University of California at Berkeley

Abstract: Algorithms have two costs: arithmetic and communication, i.e. moving data between levels of a memory hierarchy or processors over a network. Communication costs (measured in time or energy per operation) already greatly exceed arithmetic costs, and the gap is growing over time following technological trends. Thus our goal is to design algorithms that minimize communication. We present new algorithms that communicate asymptotically less than their classical counterparts, for a variety of linear algebra and machine learning problems, demonstrating large speedups on a variety of architectures. Some of these algorithms attain provable lower bounds on communication. We describe generalizations of these bounds, and optimal algorithms, to arbitrary code that can be expressed as nested loops accessing arrays, such as convolutional neural nets, and to account for arrays having different precisions.

### July 1 - Tamara G. Kolda - Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Speaker affiliation: Sandia National Laboratories

Abstract: Conventional algorithms for finding low-rank canonical polyadic (CP) tensor decompositions are unwieldy for large sparse tensors. The CP decomposition can be computed by solving a sequence of overdetermined least problems with special Khatri-Rao structure. In this work, we present an application of randomized algorithms to fitting the CP decomposition of sparse tensors, solving a significantly smaller sampled least squares problem at each iteration with probabilistic guarantees on the approximation errors. Prior work has shown that sketching is effective in the dense case, but the prior approach cannot be applied to the sparse case because a fast Johnson-Lindenstrauss transform (e.g., using a fast Fourier transform) must be applied in each mode, causing the sparse tensor to become dense. Instead, we perform sketching through leverage score sampling, crucially relying on the fact that the structure of the Khatri-Rao product allows sampling from overestimates of the leverage scores without forming the full product or the corresponding probabilities. Naïve application of leverage score sampling is ineffective because we often have cases where a few scores are quite large, so we propose a novel hybrid of deterministic and random leverage-score sampling which consistently yields improved fits. Numerical results on real-world large-scale tensors show the method is significantly faster than competing methods without sacrificing accuracy. This is joint work with Brett Larsen, Stanford University.

### July 8 - Laura Grigori - Communication avoiding low rank matrix approximation, a unified perspective on deterministic and randomized approaches

Speaker affiliation: INRIA Paris

Abstract: In this talk we present an unified perspective on deterministic and randomized approaches for computing the low rank approximation of a matrix. We survey recent approaches that allow to minimize communication and discuss a generalized LU factorization that allows to unify several existing algorithms. For this factorization we present an improved analysis which combines deterministic guarantees with sketching ensembles satisfying Johnson-Lindenstrauss properties. We then extend some of the algorithms to computing the low rank approximation of a tensor by using HOSVD while also avoiding communication.

### July 15 - Christian Lubich - Dynamical low-rank approximation

Speaker affiliation: Universität Tübingen

Abstract: This talk reviews differential equations and their numerical solution on manifolds of low-rank matrices or of tensors with a rank structure such as tensor trains or general tree tensor networks. These low-rank differential equations serve to approximate, in a data-compressed format, large time-dependent matrices and tensors or multivariate functions that are either given explicitly via their increments or are unknown solutions to high-dimensional evolutionary differential equations, with multi-particle time-dependent Schrödinger equations and kinetic equations such as Vlasov equations as noteworthy examples of applications.

Recently developed numerical time integrators are based on splitting the projection onto the tangent space of the low-rank manifold at the current approximation. In contrast to all standard integrators, these projector-splitting methods are robust to the unavoidable presence of small singular values in the low-rank approximation. This robustness relies on exploiting geometric properties of the manifold of low-rank matrices or tensors: in each substep of the projector-splitting algorithm, the approximation moves along a flat subspace of the low-rank manifold. In this way, high curvature due to small singular values does no harm.

This talk is based on work done intermittently over the last decade with Othmar Koch, Bart Vandereycken, Ivan Oseledets, Emil Kieri, Hanna Walach and Gianluca Ceruti.

### July 22 - Michael Ng - Nonnegative Low Rank Matrix Approximation and its Applications

Speaker affiliation: University of Hong Kong

Abstract: In this talk, we study low rank matrix approximation (NLRM) for nonnegative matrices arising from many data mining and pattern recognition applications. Our approach is different from classical nonnegative matrix factorization (NMF) which has been studied for some time. For a given nonnegative matrix, the usual NMF approach is to determine two nonnegative low rank matrices such that the distance between their product and the given nonnegative matrix is as small as possible. However, the proposed NLRM approach is to determine a nonnegative low rank matrix such that the distance between such matrix and the given nonnegative matrix is as small as possible. There are two advantages. (i) The minimized distance can be smaller. (ii) The proposed method can identify important singular basis vectors, while this information may not be obtained in the classical NMF. Numerical results are reported to demonstrate the performance of the proposed method. Several extensions and research works are also presented.

This talk describes joint work with Tai-Xiang Jiang (Southwestern University of Finance and Economics), JunJun Pan (Universite de Mons), Guang-Jing Song (Weifang University) and Hong Zhu (Jiangsu University).

### September 9 - David Keyes - Data-sparse Linear Algebra Algorithms for Large-scale Applications on Emerging Architectures

Speaker affiliation: KAUST

Abstract: A traditional goal of algorithmic optimality, squeezing out operations, has been superseded because of evolution in architecture. Algorithms must now squeeze memory, data transfers, and synchronizations, while extra operations on locally cached data cost relatively little time or energy. Hierarchically low-rank matrices realize a rarely achieved combination of optimal storage complexity and high-computational intensity in approximating a wide class of formally dense operators that arise in exascale applications. They may be regarded as algebraic generalizations of the fast multipole method. Methods based on hierarchical tree-based data structures and their simpler cousins, tile low-rank matrices, are well suited for early exascale architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. These data-sparse algorithms are ushering in a renaissance of numerical linear algebra. We describe modules of a software toolkit, Hierarchical Computations on Manycore Architectures (HiCMA), that illustrate these features on several applications. Early modules of this open-source project are distributed in software libraries of major vendors. A recent addition, H2Opus, extends H2 hierarchical matrix operations to distributed memory and GPUs.

### September 16 - Elisabeth Ullmann - Approximation of parametric covariance matrices

Speaker affiliation: TU Munich

Abstract: Covariance operators model the spatial, temporal or other correlation between collections of random variables. In modern applications these random variables are often associated with an infinite-dimensional or high-dimensional function space. Examples are the solution of a partial differential equation with random coefficients in uncertainty quantification (UQ), or Gaussian process regression in machine learning. When a suitable discretization of the function space has been applied, the discretized covariance operator becomes a very large matrix - the covariance matrix - with a size that is of the order of the dimension of the discrete space squared.

Covariance matrices are naturally symmetric and positive semi-definite, but in the applications we are interested in, they are typically dense. To avoid the enormous cost of creating and handling these dense matrices, efficient low-rank approximations such as the pivoted Cholesky decomposition, or the adaptive cross approximation (ACA) have been developed during the last decade.

But the story does not end here since recently, the attention has shifted to parameterized covariance operators. This is due to their increased modeling capacity, e.g., in Bayesian inverse problems or Gaussian process regression with hyperparameters in machine learning. Now we are faced with the task to approximate a parametric covariance matrix where the parameter itself is a random process. Simply repeating the ACA or pivoted Cholesky decomposition for different parameter values is inefficient and most certainly too expensive in practise.

We introduce and study two algorithms for the approximation of parametric families of covariance matrices. The first approach is a (non-certified) approximation, and employs a reduced basis associated with a collection of eigenvectors for specific parameter values. The second approach is a certified extension of the ACA where the approximation error is controlled in the Wasserstein-2 distance of two Gaussian measures. Both approaches rely on an affine linear expansion of the covariance operator with respect to the parameter. This keeps the computational cost under control. Notably, both algorithms do not require regular meshes in the covariance operator discretization and can be used on irregular domains.

This talk describes joint work with Daniel Kressner (EPFL), Jonas Latz (University of Cambridge), Stefano Massei (TU/e) and Marvin Eisenberger (TUM).

### September 30 - Howard Elman - Multigrid Methods for Computing Low-Rank Solutions to Parameter-Dependent Partial Differential Equations

Speaker affiliation: University of Maryland

Abstract: The collection of solutions of discrete parameter-dependent partial differential equations often takes the form of a low-rank matrix. We show that in this scenario, iterative algorithms for computing these solutions can take advantage of low-rank structure to reduce both computational effort and memory requirements. Implementation of such solvers requires that explicit rank-compression computations be done to truncate the ranks of intermediate quantities that must be computed. We prove that when truncation strategies are used as part of a multigrid solver, the resulting algorithms retain "textbook" (grid-independent) convergence rates, and we demonstrate how the truncation criteria affect convergence behavior. In addition, we show that these techniques can be used to construct efficient solution algorithms for computing the eigenvalues of parameter-dependent operators. In this setting, parameterized eigenvectors can be grouped into matrices of low-rank structure, and we introduce a variant of inverse subspace iteration for computing them. We demonstrate the utility of this approach on two benchmark problems, a stochastic diffusion problem with some poorly separated eigenvalues, and an operator derived from a discrete Stokes problem whose minimal eigenvalue is related to the inf-sup stability constant.

This is joint work with Tengfei Su.

### October 14 - Sherry Li - Autotuning exascale applications with Gaussian process regression

Speaker affiliation: Lawrence Berkeley National Laboratory

Abstract: Significant effort has been invested to develop highly scalable numerical libraries and high-fidelity modeling and simulation for the upcoming exascale computers. These codes typically involve many parameters which need to be selected properly to optimize performance on the underlying parallel machine. They are also expensive to run and thus have limited "function evaluation" values, which post significant challenges to efficient performance tuning on diverse architectures.

Bayesian optimization with Gaussian process regression is an attractive machine learning framework to build surrogate models with limited function evaluation points. In order to fully utilize all the available data, we leverage multitask learning and multi-armed bandit strategies to build a more advanced Bayesian optimization framework.

We have developed an open-source software tool, called GPTune, for optimizing expensive large-scale HPC codes. We will show several features of GPTune, e.g., incorporation of coarse performance models to improve the Bayesian model, multi-objective tuning such as tuning a hybrid of time, memory and accuracy, and reuse of historical data base for model portability.

We will demonstrate the efficiency and effectiveness of GPTune when it is applied to numerical linear algebra libraries, such as ScaLAPACK, SuperLU and Hypre, as well as fusion simulation codes M3D-C1 and NIMROD.

This talk describes joint work with James Demmel, Yang Liu, Osni Marques, Wissam Sid-Lakhdar and Xianran Zhu.

### October 28 - Yuji Nakatsukasa - Fast and stable randomized low-rank matrix approximation

Speaker affiliation: University of Oxford

Abstract: Randomized SVD has become an extremely successful approach for efficiently computing a low-rank approximation of matrices. In particular the paper by Halko, Martinsson, and Tropp (SIREV 2011) contains extensive analysis, and has made it a very popular method. The typical complexity for a rank-r approximation of mxn matrices is O(mnlog n+(m+n)r^2) for dense matrices. The classical Nystrom method is much faster, but only applicable to positive semidefinite matrices. This work studies a generalization of Nystrom's method applicable to general matrices, and shows that (i) it has near-optimal approximation quality comparable to competing methods, (ii) the computational cost is the near-optimal O(mnlog n+r^3) for dense matrices, with small hidden constants, and (iii) crucially, it can be implemented in a numerically stable fashion despite the presence of an ill-conditioned pseudoinverse. Numerical experiments illustrate that generalized Nystrom can significantly outperform state-of-the-art methods, especially when r>>1, achieving up to a 10-fold speedup. The method is also well suited to updating and downdating the matrix.

### November 11 - Des Higham - Concepts and Algorithms for Higher Order Networks: Beyond Pairwise Interactions

Speaker affiliation: University of Edinburgh

Abstract: Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recently, there has been increased interest in the idea of accounting directly for higher order features. Such features may be built from the adjacency matrix---for example, a triangle involving nodes i, j and k arises when the three edges, i<->j, j<->k and k<->i are present. In other contexts, higher order information appears explicitly---for example, in a coauthorship network, a document involving three authors forms a natural triangle. I will discuss the use of tensor-based definitions and algorithms to exploit such higher order information. The algorithms also incorporate nonlinearities that increase flexibility. I will focus on spectral methods that extend classical concepts of node centrality and clustering coefficients. The underlying object of study will be a constrained nonlinear eigenvalue problem associated with a tensor. Using recent results from nonlinear Perron--Frobenius theory, we can establish existence and uniqueness under mild conditions, and show that such spectral measures can be computed efficiently and robustly with a nonlinear power method.

The talk is based on joint work with Francesca Arrigo (University of Strathclyde) and Francesco Tudisco (Gran Sasso Science Institute).

### November 25 - Françoise Tisseur - Towards reliable eigensolvers for nonlinear eigenvalue problems

Speaker affiliation: The University of Manchester

Abstract: Given a matrix-valued function F that depends nonlinearly on a single parameter z, the basic nonlinear eigenvalue problem consists of finding complex scalars z for which F(z) is singular. Such problems underpin many areas of computational science and engineering and are often difficult to solve due to their nonlinear nature, the large problem size and poor conditioning.

One approach for solving nonlinear eigenvalue problems (NEPs) consists of approximating F by a rational matrix-valued function and then solve a rational eigenproblem, still nonlinear in the parameter but simpler to solve numerically. An error analysis for the eigenpairs of F computed from a rational approximation of F gives insights into the sampling accuracy required to solve NEPs in this way with a guaranteed backward error, an essential requirement for a robust eigensolver. This error analysis is used to construct rational approximations of F that are reliable for a given tolerance and scaling-independent on a given subset of the complex plane. A comprehensive comparison of these rational approximations on a large range of test problems from the NLEVP collection is presented.

This is joint work with Stefan Güttel and Gian Maria Negri Porzio.

### December 9 - Chen Greif - Block-Structured Indefinite Linear Systems with Rank-Deficient Blocks

Speaker affiliation: The University of British Columbia

Abstract: Large, sparse, symmetric linear systems with a saddle-point structure come in several flavors. In this talk I will provide an overview of algebraic properties of matrices associated with such systems, which have a special rank structure. We will focus on situations where the leading block is singular with a high nullity, and describe mathematical and algebraic properties that allow us to design preconditioned iterative solvers with indefinite preconditioners that rely on null spaces of the operators involved. We will illustrate some of the observations on a 4-by-4 block linear system arising from an incompressible magnetohydrodynamics model problem.

### January 13 - Anne Greenbaum - Spectral Sets: Numerical Range and Beyond

Speaker affiliation: University of Washington

Let $A$ be an $n$ by $n$ matrix. We are interested in regions $\Omega$ in the complex plane for which there is a moderate size constant $K$ such that

$$\|f(A)\| ≤ K \sup_{z\in\Omega} |f(z)|,$$

for all functions $f$ analytic in $\Omega$. Here $\|\cdot \|$ denotes the spectral norm, or, the largest singular value. Such a region $\Omega$ is called a $K$-spectral set for $A$. For a normal matrix $A$, one can simply take $\Omega$ to be the set of eigenvalues and $K = 1$, but for nonnormal matrices things are not so straightforward. Von Neumann’s inequality states that if $A$is a contraction, i.e., $\|A\| \leq 1$, then the unit disk is a $K$-spectral set with $K = 1$. Crouzeix and Palencia showed that the numerical range, $W(A) := \{\langle Aq, q\rangle : \|q\| = 1\}$, is a $(1+\sqrt{2})$-spectral set, and Crouzeix’s conjecture is that it is actually a $2$-spectral set. Okubo and Ando showed that any disk containing $W(A)$ is a 2-spectral set. Using techniques derived from those in the Crouzeix-Palencia proof, we derive other $K$-spectral sets. We show that an annulus with outer radius $R := \max\{\|A\|, \|A−1\|\}$ and inner radius $1/R$ is a $(1 + \sqrt{2})$-spectral set for $A$. We show further that if $\alpha \in W(A)$ and $\Omega = W(A)\setminus D(\alpha, w((A − \alpha I)−1))$, where $D(c, r)$ denotes the disk about $c$ of radius $r$ and $w(·)$ denotes the numerical radius, then $\Omega$ is a $(3 + 2\sqrt{3})$-spectral set for $A$. We discuss implications for describing the convergence of the GMRES algorithm for solving linear systems and the rational Arnoldi algorithm for computing the product of a function of a matrix with a given vector.

This is joint work with Michel Crouzeix.

### January 27 - Nicolas Gillis - Identifiability and Computation of Nonnegative Matrix Factorizations

Speaker affiliation: Université de Mons

Abstract: Given a nonnegative matrix X and a factorization rank r, nonnegative matrix factorization (NMF) approximates the matrix X as the product of a nonnegative matrix W with r columns and a nonnegative matrix H with r rows. NMF has become a standard linear dimensionality reduction technique in data mining and machine learning. In this talk, we address the issue of non-uniqueness of NMF decompositions, also known as the identifiability issue, which is crucial in many applications. We discuss three key NMF models that allow us to obtain unique NMFs, namely, separable NMF, minimum-volume NMF, and sparse NMF. We also discuss how the factors (W,H) in such models can be computed. We illustrate these results on facial feature extraction, blind hyperspectral unmixing, and topic modeling.