
Ariful Azad, Lawrence Berkeley National Laboratory
Title: Parallel sparse matrixmatrix multiplication and its use in triangle counting and enumeration
Abstract: We present experimental results for optimized implementations of various (1D, 2D, 3D) parallel SpGEMM algorithms. We quantify the effects of innode multithreading, communicationavoiding 3D algorithms, and the implementation tradeoffs involved. We demonstrate SpGEMM's applications to graph problems such as triangle counting, graph contracting, and Markov clustering. We will also present a new primitive, masked SpGEMM, that avoids communication when the output structure (or an upper bound on it) is known beforehand. Masked SpGEMM has applications to matrixbased triangle counting and enumeration.
Jakub Cerveny, Lawrence Berkeley National Laboratory
Title: Nonconforming Mesh Refinement for HighOrder Finite Elements
Abstract: We propose a general algorithm for nonconforming adaptive mesh refinement (AMR) in highorder nodal finite element codes. The algorithm handles triangular, quadrilateral and hexahedral meshes of arbitrarily high polynomial order, for any finite element space. We design a data structure for meshes with hanging nodes and develop a procedure to construct the AMR system through variational restriction. Our approach supports complex anisotropic refinements in both 2D and 3D, unlimited refinement levels of adjacent elements, and MPI parallelism with dynamic load balancing. We demonstrate the proposed method on several shock hydrodynamic and electromagnetic problems.
Andrea Dotti, SLAC National Accelerator Laboratory
Title: Modernizing large scale software for parallel hardware: the experience with GEANT4
Abstract: The simulation of the interaction of subatomic particles with matter is a computational challenge common to many science and technology domains. Such studies give insight into diverse fields, unlocking secrets on the behavior and origins of matter (High Energy Physics and Cosmology), improving human health (advancing the design of medical imaging equipment and radiation therapy cancer treatment), and predicting damage to critical electronic systems (how spacecraft electronics or even automotive braking systems may be affected by cosmic rays). To reach the precision needed by today's “needle in a haystack” discoveries in these fields, particle interactions must be simulated in the stepbystep method known as Monte Carlo particle transport. Because MC processes are stochastic, we build up a meaningful result only by simulating large numbers of particle interactions. Achieving precision requires simulation of millions or even billions of interactions. Modern hardware architectures (those with manycore architectures and wide vectorregisters) and energyefficient coprocessor systems could supply the required computation power. The Geant4 toolkit is an opensource C++ product developed by a collaboration of physicists and computer scientists from 20 countries in North America, Europe and Asia. Developed over a 20 year period, the current Geant4 contains 2 millions lines of code. With the latest Geant4 major release (version 10.0) eventlevel parallelism via multithreading has been introduced obtaining very good results in terms of scalability and performances. We will report on the design and techniques used in Geant4 to introduce parallel algorithms. We will describe its successes but we will also describe the challenging aspects encountered to adapt a large codebase to modern parallel systems. Examples of performances including using Xeon Phi coprocessors, GPUs and supercomputers will be presented.
Moe Khalil, Sandia National Laboratory, Livermore
Title: Probabilistic Inference of Model Parameters and Missing HighDimensional Data Based on Summary Statistics
Abstract: We present the results of an application of Maximum Entropy and Approximate Bayesian Computation methods to the probabilistic inference of model parameters in the absence of experimental data. The available published data are in the form of indirect summary statistics, being nominal values and error bars, of model parameters. This approach relies on generating data that is consistent with the given summary statistics. The methodology permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. For the specific application to chemical modeling of hydrogenair ignition, the previous algorithm consisting of a nested Markov Chain Monte Carlo procedure, with the outer chain exploring the missing data space and inner chain the unknown parameters, is computationally taxing given the high dimensionality of the missing data (in the order of 100s to 1000s). We propose an alternative approach consisting of importance sampling and Gauss–Hermite quadrature for parameter inference resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong nonlinearity in the model, the consistent data sets all result in nearly Gaussian conditional parameter probability density functions. Thus, the proposed technique based on importance sampling with a Gaussian proposal is adequate for parameter estimation in this investigation. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set.
Victor Minden, Stanford Title: Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations Abstract: Maximum likelihood estimation for parameterfitting given observations from a kernelized Gaussian process in space is a computationallydemanding task that restricts the use of such methods to moderatelysized datasets. We present a procedure for unstructured 2D observations that allows for evaluation of the loglikelihood and its gradient (i.e., the score equations) in \tilde\mathcal{O}(n^{3/2)) time and \mathcal{O}(n\log n) storage, where $n$ is the number of observations. Our method relies on the skeletonization procedure described by Martinsson \& Rokhlin (2005) in the form of the \emph{recursive skeletonization factorization} of Ho \& Ying (2015). Combining this with an adaptation of the matrix peeling algorithm of Lin et al. (2011), our method can be used in the context of any firstorder optimization routine to quickly and accurately compute maximumlikelihood estimates.
Juliane Mueller, Lawrence Berkeley National Laboratory
Title: Surrogate Model Algorithm for Computationally Expensive Multiobjective Optimization Problems
Abstract: Optimization problems arise in many application areas such as environmental engineering, transportation planning, and climate modeling. Often, several conflicting objectives must be minimized simultaneously. There does not exist a single solution that minimizes all objective functions. Rather, a set of tradeoff (Paretooptimal) solutions must be identified. We consider optimization problems whose objective function evaluations require computationally expensive blackbox simulations and for which derivative information is not available. Our goal is to find (near) Paretooptimal solutions by doing only very few expensive objective function evaluations in order to keep the total optimization time acceptable. To this end, we employ computationally cheap surrogate models throughout the optimization process that approximate the expensive objective functions and guide the search for improved solutions.
Diep Nguyen, University of California, Berkeley
Title: Reproducibility at ExtremeScale
Abstract: Challenges to exascale computing include scaling to millions of processors, reducing energy and communication, and ensuring correctness in a dynamic and heterogeneous computing environment. If multiple runs of the same program produce different results, because of floating point errors, it will be difficult to ensure correctness or debug code. Bitwise reproducibility is the ability to obtain identical results from run to run of the same program or function, with possibly different computing resources. There have been numerous recent efforts to attain this goal, however none has solved the problem completely, requiring tradeoffs among accuracy, performance and reproducibility. For example, converting to a fixedpoint or integer format might produce reproducible results at almost full speed but with lower accuracy and a limited range of input values to avoid overflow. Another approach is “exact” or correctly rounded arithmetic, which guarantees reproducibility but at high cost. In this talk I will present our proposed technique to obtain rigorous reproducibility, while offering tradeoffs between performance and accuracy. Our technique is suitable for massively parallel environments, requiring minimal communication. I will discuss our ongoing work to improve the performance by relaxing the requirement to accommodate completely random input orders, i.e. fixing the order of input data at some level to make use of local performanceoptimized operations, while not impacting the overall parallelism. This would make reproducible floatingpoint computation a practical goal for the extreme scale.
Noemi Petra, University of California, Merced
Title: LargeScale Bayesian Inverse Problems Governed by Differential and Algebraic Equations Abstract: We consider the estimation of the uncertainty in the solution of (largescale) inverse problems within the framework of Bayesian inference. Here we approximate the posterior probability density with a Gaussian. Computing the mean of this Gaussian, i.e., the maximum a posteriori (MAP) estimate of the posterior distribution, requires the solution of a regularized leastsquares optimization problem governed by systems of differential equations. The optimization problem is solved with an efficient adjointbased Newtonconjugate gradient method, which uses first and second derivative information of the negative log posterior. The posterior covariance matrix is given (in the linearGaussian case) by the inverse of the Hessian of the negative log posterior. To alleviate the computational cost, we exploit the compact nature of the Hessian of the data misfit term and construct a low rank approximation at a cost (in number of PDE solves) that does not depend on the problem size, thus providing scalability to problem sizes of practical interest. We apply this method to quantify the uncertainties in the inference of the basal sliding coefficient in an ice sheet inverse problem and a similar technique for the inversion of the inertia parameter of generators in power system problem.
Daniele Schiavazzi, Stanford
Title: Assimilation and Propagation of Clinical Data Uncertainty in Cardiovascular Modeling Abstract: The increasing adoption of computational tools to complement clinical data collection and inform treatment decisions demands a thorough characterization of the confidence in the predicted quantities. This confidence is affected by modeling assumptions as well as variability in clinical data, anatomy, vessel wall material and physiologic response to surgery. A wide spectrum of pathologies can be investigated using modern computational tools that are able to simulate complex fluidstructure interaction phenomena (e.g., valve dynamics), multiscale coupling between 3D NavierStokes solvers and 0D boundary circulation models, and growth and remodeling of vascular tissue. However, the challenge of "learning" model parameters from uncertain patientspecific data remains. This is key to our ability to simulate large patient cohorts and to overcome the limitations of operatordependent and time consuming "manual" tuning often performed for these models. The presentation focuses on the effects of clinical data uncertainty, showing examples of quantification of confidence in simulated Stage II single ventricle surgical outcomes, and using adaptive MCMC to learn the parameters of 0D Norwood circulation models from uncertain blood pressures and flows.
Ke Wei, University of California, Davis
Title: Guarantees of Riemannian Optimization for Low Rank Matrix Recovery
Abstract: We establish theoretical recovery guarantees of a family of Riemannian optimization algorithms for low rank matrix recovery, which is about recovering an $m\times n$ rank $r$ matrix from $p < mn$ number of linear measurements. The algorithms are first interpreted as the iterative hard thresholding algorithms with subspace projections. Then based on this connection, we prove that if the restricted isometry constant $R_{3r}$ of the sensing operator is less than $C_\kappa /\sqrt{r}$ where $C_\kappa$ depends on the condition number of the matrix, the Riemannian gradient descent method and a restarted variant of the Riemannian conjugate gradient method are guaranteed to converge to the measured rank $r$ matrix provided they are initialized by one step hard thresholding. Empirical evaluation shows that the algorithms are able to recover a low rank matrix from nearly the minimum number of measurements necessary.
