### Program

 9:00 am Katherine Yelick (Associate Laboratory Director for Computing Sciences, LBNL) Welcome / Opening remarks 9:10 am Juliane Mueller (LBNL) - Surrogate Model Algorithm for Computationally Expensive Multi-objective Optimization Problems 9:40 am Diep Nguyen (UC-Berkeley) - Reproducibility at Extreme-Scale 10:10 am Daniele Schiavazzi (Stanford) - Assimilation and Propagation of Clinical Data Uncertainty in Cardiovascular Modeling 10:40 am BREAK 10:55 am Jakub Cerveny (LLNL) - Nonconforming Mesh Refinement for High-Order Finite Elements 11:25 am Noemi Petra (UC-Merced) - Large-Scale Bayesian Inverse Problems Governed by Differential and Algebraic Equations 11:55 am LUNCH 1:00 pm Ariful Azad (LBNL) - Parallel sparse matrix-matrix multiplication and its use in triangle counting and enumeration 1:30 pm Ke Wei (UC-Davis) - Guarantees of Riemannian Optimization for Low Rank Matrix Recovery 2:00 pm Moe Kahlil (SNL) - Probabilistic Inference of Model Parameters and Missing High-Dimensional Data Based on Summary Statistics 2:30 pm BREAK 2:45 pm Victor Minden (Stanford) - Fast spatial Gaussian Process Maximum Likelihood High-Dimensional Data Based on Summary Statistics 3:15 pm Andrea Dotti (SLAC) - Modernizing large scale software for parallel hardware: the experience with GEANT4 3:45 pm Closing Remarks 4:00 pm Poster Session (Building 59, Rm 3101) 5:00 pm ADJOURN

## Speakers / Abstracts

 Ariful Azad, Lawrence Berkeley National Laboratory Title: Parallel sparse matrix-matrix multiplication and its use in triangle counting and enumeration Abstract: We present experimental results for optimized implementations of various (1D, 2D, 3D) parallel SpGEMM algorithms. We quantify the effects of in-node multithreading, communication-avoiding 3D algorithms, and the implementation trade-offs involved. We demonstrate SpGEMM's applications to graph problems such as triangle counting, graph contracting, and Markov clustering.  We will also present a new primitive, masked SpGEMM, that avoids communication when the output structure (or an upper bound on it) is known beforehand. Masked SpGEMM has applications to matrix-based triangle counting and enumeration. Jakub Cerveny, Lawrence Berkeley National Laboratory Title: Nonconforming Mesh Refinement for High-Order Finite Elements Abstract: We propose a general algorithm for nonconforming adaptive mesh refinement (AMR) in high-order nodal finite element codes. The algorithm handles triangular, quadrilateral and hexahedral meshes of arbitrarily high polynomial order, for any finite element space. We design a data structure for meshes with hanging nodes and develop a procedure to construct the AMR system through variational restriction. Our approach supports complex anisotropic refinements in both 2D and 3D, unlimited refinement levels of adjacent elements, and MPI parallelism with dynamic load balancing. We demonstrate the proposed method on several shock hydrodynamic and electromagnetic problems.   Andrea Dotti, SLAC National Accelerator Laboratory Title: Modernizing large scale software for parallel hardware: the experience with GEANT4 Abstract: The simulation of the interaction of subatomic particles with matter is a computational challenge common to many science and technology domains. Such studies give insight into diverse fields, unlocking secrets on the behavior and origins of matter (High Energy Physics and Cosmology), improving human health (advancing the design of medical imaging equipment and radiation therapy cancer treatment), and predicting damage to critical electronic systems (how spacecraft electronics or even automotive braking systems may be affected by cosmic rays). To reach the precision needed by today's “needle in a haystack” discoveries in these fields, particle interactions must be simulated in the step-by-step method known as Monte Carlo particle transport. Because MC processes are stochastic, we build up a meaningful result only by simulating large numbers of particle interactions. Achieving precision requires simulation of millions or even billions of interactions. Modern hardware architectures (those with many-core architectures and wide vector-registers) and energy-efficient co-processor systems could supply the required computation power.  The Geant4 toolkit is an open-source C++ product developed by a collaboration of physicists and computer scientists from 20 countries in North America, Europe and Asia. Developed over a 20 year period, the current Geant4 contains 2 millions lines of code.  With the latest Geant4 major release (version 10.0) event-level parallelism via multi-threading has been introduced obtaining very good results in terms of scalability and performances.  We will report on the design and techniques used in Geant4 to introduce parallel algorithms. We will describe its successes but we will also describe the challenging aspects encountered to adapt a large code-base to modern parallel systems.  Examples of performances including using Xeon Phi co-processors, GPUs and supercomputers will be presented. Moe Khalil, Sandia National Laboratory, Livermore Title: Probabilistic Inference of Model Parameters and Missing High-Dimensional Data Based on Summary Statistics Abstract: We present the results of an application of Maximum Entropy and Approximate Bayesian Computation methods to the probabilistic inference of model parameters in the absence of experimental data. The available published data are in the form of indirect summary statistics, being nominal values and error bars, of model parameters. This approach relies on generating data that is consistent with the given summary statistics. The methodology permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. For the specific application to chemical modeling of hydrogen-air ignition, the previous algorithm consisting of a nested Markov Chain Monte Carlo procedure, with the outer chain exploring the missing data space and inner chain the unknown parameters, is computationally taxing given the high dimensionality of the missing data (in the order of 100s to 1000s). We propose an alternative approach consisting of importance sampling and Gauss–Hermite quadrature for parameter inference resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong nonlinearity in the model, the consistent data sets all result in nearly Gaussian conditional parameter probability density functions. Thus, the proposed technique based on importance sampling with a Gaussian proposal is adequate for parameter estimation in this investigation. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. Victor Minden, Stanford Title: Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizationsAbstract: Maximum likelihood estimation for parameter-fitting given observations from a kernelized Gaussian process in space is a computationally-demanding task that restricts the use of such methods to moderately-sized datasets.  We present a procedure for unstructured 2D observations that allows for evaluation of the log-likelihood and its gradient (i.e., the score equations) in \tilde\mathcal{O}(n^{3/2)) time and \mathcal{O}(n\log n) storage, where $n$ is the number of observations.  Our method relies on the skeletonization procedure described by Martinsson \& Rokhlin (2005) in the form of the \emph{recursive skeletonization factorization} of Ho \& Ying (2015).  Combining this with an adaptation of the matrix peeling algorithm of Lin et al. (2011), our method can be used in the context of any first-order optimization routine to quickly and accurately compute maximum-likelihood estimates. Juliane Mueller, Lawrence Berkeley National LaboratoryTitle: Surrogate Model Algorithm for Computationally Expensive Multi-objective Optimization ProblemsAbstract: Optimization problems arise in many  application areas such as environmental  engineering, transportation planning, and climate modeling. Often, several conflicting objectives must be minimized simultaneously. There does not exist a single solution that minimizes all objective functions. Rather, a set of trade-off (Pareto-optimal) solutions must be identified. We consider optimization problems whose objective function evaluations require computationally expensive black-box simulations and for which derivative information is not available. Our goal is to find (near-) Pareto-optimal solutions by doing only very few expensive objective function evaluations in order to keep the total optimization time acceptable. To this end, we employ computationally cheap surrogate models throughout the optimization process that approximate the expensive objective functions and guide the search for improved solutions. Diep Nguyen, University of California, BerkeleyTitle: Reproducibility at Extreme-Scale Abstract: Challenges to exascale computing include scaling to millions of processors, reducing energy and communication, and ensuring correctness in a dynamic and heterogeneous computing environment. If multiple runs of the same program produce different results, because of floating point errors, it will be difficult to ensure correctness or debug code. Bit-wise reproducibility is the ability to obtain identical results from run to run of the same program or function, with possibly different computing resources. There have been numerous recent efforts to attain this goal, however none has solved the problem completely, requiring tradeoffs among accuracy, performance and reproducibility. For example, converting to a fixed-point or integer format might produce reproducible results at almost full speed but with lower accuracy and a limited range of input values to avoid overflow. Another approach is “exact” or correctly rounded arithmetic, which guarantees reproducibility but at high cost. In this talk I will present our proposed technique to obtain rigorous reproducibility, while offering trade-offs between performance and accuracy. Our technique is suitable for massively parallel environments, requiring minimal communication. I will discuss our on-going work to improve the performance by relaxing the requirement to accommodate completely random input orders, i.e. fixing the order of input data at some level to make use of local performance-optimized operations, while not impacting the overall parallelism. This would make reproducible floating-point computation a practical goal for the extreme scale. Noemi Petra, University of California, Merced Title: Large-Scale Bayesian Inverse Problems Governed by Differential and Algebraic Equations Abstract: We consider the estimation of the uncertainty in the solution of (large-scale) inverse problems within the framework of Bayesian inference. Here we approximate the posterior probability density with a Gaussian. Computing the mean of this Gaussian, i.e., the maximum a posteriori (MAP) estimate of the posterior distribution, requires the solution of a regularized least-squares optimization problem governed by systems of differential equations. The optimization problem is solved with an efficient adjoint-based Newton-conjugate gradient method, which uses first and second derivative information of the negative log posterior. The posterior covariance matrix is given (in the linear-Gaussian case) by the inverse of the Hessian of the negative log posterior. To alleviate the computational cost, we exploit the compact nature of the Hessian of the data misfit term and construct a low rank approximation at a cost (in number of PDE solves) that does not depend on the problem size, thus providing scalability to problem sizes of practical interest. We apply this method to quantify the uncertainties in the inference of the basal sliding coefficient in an ice sheet inverse problem and a similar technique for the inversion of the inertia parameter of generators in power system problem. Daniele Schiavazzi, Stanford Title: Assimilation and Propagation of Clinical Data Uncertainty in Cardiovascular Modeling Abstract: The increasing adoption of computational tools to complement clinical data collection and inform treatment decisions demands a thorough characterization of the confidence in the predicted quantities. This confidence is affected by modeling assumptions as well as variability in clinical data, anatomy, vessel wall material and physiologic response to surgery. A wide spectrum of pathologies can be investigated using modern computational tools that are able to simulate complex fluid-structure interaction phenomena (e.g., valve dynamics), multi-scale coupling between 3D Navier-Stokes solvers and 0D boundary circulation models, and growth and remodeling of vascular tissue. However, the challenge of "learning" model parameters from uncertain patient-specific data remains. This is key to our ability to simulate large patient cohorts and to overcome the limitations of operator-dependent and time consuming "manual" tuning often performed for these models.  The presentation focuses on the effects of clinical data uncertainty, showing examples of quantification of confidence in simulated Stage II single ventricle surgical outcomes, and using adaptive MCMC to learn the parameters of 0D Norwood circulation models from uncertain blood pressures and flows. Ke Wei, University of California, Davis Title: Guarantees of Riemannian Optimization for Low Rank Matrix RecoveryAbstract: We establish theoretical recovery guarantees of a family of Riemannian optimization algorithms for  low rank matrix recovery, which  is about recovering an $m\times n$ rank $r$ matrix from $p < mn$ number of linear measurements. The algorithms are first interpreted as the iterative hard thresholding algorithms with subspace projections. Then based on this connection, we prove that if the restricted isometry constant  $R_{3r}$  of the sensing operator is less than $C_\kappa /\sqrt{r}$ where $C_\kappa$ depends on the condition number of the matrix, the Riemannian gradient descent method and a restarted variant of the Riemannian conjugate gradient method are guaranteed to converge to the measured rank $r$ matrix provided they are initialized by one step hard thresholding. Empirical evaluation shows that the algorithms are able to recover a low rank matrix from nearly the minimum number of measurements necessary.