Titles and Abstracts

2023 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing

Plenary Speaker

Invited Speakers

Student Speakers

Plenary Speaker

Chau-Lyan Chang 張朝亮 (National Center for High-Performance Computing)

High-Fidelity Numerical Computations of Unsteady Physical Conservation Laws Using the Space-Time Integral Formulations

Abstract:

Due to the wide availability of commercial and open-source software, numerical computations of physical conservation laws are now routinely carried out for a broad range of scientific and engineering applications. As a measure to attract a wider usage, these codes often put robustness as the first priority. While users can easily and quickly obtain results, the solution accuracy for complicated configurations, especially with the presence of unsteady and discontinuous phenomena, is often at best questionable. For this very reason, many research institutions around the world are still pursuing alternative numerical methods to provide high-fidelity unsteady solutions for many critical scientific/engineering problems. This talk focuses on recent efforts within NCHC to construct a numerical and software framework based the space-time integral approach to encompass a variety of problems formulated as physical conservation laws. More specifically, the space-time conservation element, solution element (CESE) method, first introduced in the 1990s to solve fluid dynamics and acoustic problems, and later adopted to many different disciplines such as fluid-structure interaction, electromagnetic, detonation, and solid stress waves, was adopted as the core solver for this numerical framework. The underlying principles of this space-time integral numerical method and its space-time reversal properties will be discussed in detail, along with examples of its applications in capturing strong discontinuities and unsteady waves simultaneously. Moreover, our recent development in implementing the CESE method for mixed-element meshes and immersed boundaries applications using a Cartesian baseline mesh that is free of grid-generation will be presented.

Invited Speakers 

Session 1

Kengo Nakajima (The University of Tokyo)

h3-Open-BDEC: Innovative Software Infrastructure for Scientific Computing in the Exascale Era by Integrations of (Simulation + Data +Learning)

Abstract:

We propose an innovative method for computational science for sustainable promotion of scientific discovery by supercomputers in the Exascale Era by combining (Simulation + Data + Learning (S+D+L)). In May 2021, we started operation of the Wisteria/BDEC-01 system with 33+PF at University of Tokyo. It is a Hierarchical, Hybrid, Heterogeneous (h3) system, which consists of computing nodes for CSE with A64FX and those for Data Analytics/AI with NVIDIA A100 GPU’s. We develop a software platform “h3-Open-BDEC” for integration of (S+D+L) and evaluate the effects of integration of (S+D+L) on the Wisteria system. The h3-Open-BDEC is designed for extracting the maximum performance of the supercomputers with minimum energy consumption focusing on (1) innovative method for numerical analysis with high-performance/high-reliability/power-saving based on the new principle of computing by adaptive precision, accuracy verification and automatic tuning, (2) Hierarchical Data Driven Approach (hDDA) based on machine learning, and (3) Software for heterogeneous systems, such as Wisteria/BDEC-01. Integration of (S+D+L) by h3-Open-BDEC enables significant reduction of computations and power consumption, compared to those by conventional simulations.


Yu-heng Tseng 曾于恒 (National Taiwan University)

Accelerated Barotropic Solver for the High-resolution Ocean Model Component in the Community Earth System Model Version 2

Abstract:

The high-resolution ocean model is the most computationally expensive component within the Community Earth System Model (CESM). The major bottleneck is that the barotropic solver scales poorly at high core numbers. We design a new barotropic solver to accelerate the high-resolution ocean simulation. The novel solver adopts a Chebyshev-type iterative method to reduce the global communication cost in conjunction with an effective block preconditioner to further reduce the iterations. The algorithm and its computational complexity are theoretically analyzed and compared with other existing methods. The significant reduction of the global communication time with a competitive convergence rate is confirmed by a series of idealized tests. Numerical experiments using the 0.1° CESM global ocean model show that the proposed approach results in a factor of 1.7 speed-up over the original method with no loss of accuracy, achieving 10.5 simulated years per wall-clock day on 16875 cores.

Eh Tan 譚諤 (Academia Sinica)

GPU Acceleration on Geodynamic Simulation via OpenACC

Abstract:

Numerical simulation of geodynamic processes is computationally expensive. The pursuit of higher resolution and more accurate physical simulation requires more and more computation power. The speed improvement of CPUs has stalled in recent years. Moreover, the speed of memory access has improved only slowly in decades, which further reduces the performance of the simulation. The advance of GPGPU (General Purpose computing on GPU) can help to solve the performance problem. GPU provides quick memory access and fast context switch to hide memory access latency while keeping the computation units busy. GPU and CPU have separated memory space. Traditionally, programmers have to manually manage data transfer between GPU and CPU before and after the computation. The newer generation of Nvidia GPU provides unified memory space to avoid manual data transfers. Additionally, we can port the CPU codes to GPUs using a few lines of OpenACC directives. In the end, we completely ported out the simulation code to GPU and achieved a 40x speed-up, compared to a single CPU performance. We will detail the porting strategy and compare the similarity of OpenACC to OpenMP.

Session 2

Satoshi Ohshima (Kyusyu University)

QR Factorization of Block Low-rank Matrices on Multiple-/Multi-Instance GPUs

Abstract:

QR factorization is one of the important computation and used in various numerical simulations. Many large-scale simulations using huge-size matrices require both large-size memory and long-time computation. Low-rank approximation methods are expected to decrease both of them. Block low-rank (BLR) matrices is one of the low-rank approximation methods and QR factorization of BLR (BLR-QR) is already implemented. To reduce execution time of it, we are trying to utilize GPU. However, calculations in BLR-QR are not very suitable to GPU. In this talk, we introduce our fast implementation techniques on multiple-/multi-instance GPUs.


An-Cheng Yang 楊安正 (National Center for High-Performance Computing)

Towards Exascale Computing - the Status of Scientific Computing in Taiwan

Abstract:

NCHC is the biggest computing resource provider in Taiwan. We support hundreds of scientific and engineering projects funded by the Taiwan government per year. Meeting the computational requirement from various domains is a great challenge. I’ll share the status of scientific computing in Taiwan and try to offer insights into the general trends of high-performance computing in Taiwan.

Meng-Huo Chen 陳孟豁 (National Chung Cheng University)

Parallel Simulation on Cerebral Fluid Transport in Brain

Abstract:

In this research we intend to develop the methods for parallelizing the simulations for cerebral poromechanics model on general non-regular domains. The dynamics of cerebral fluid transport in brain is considered as types of the fluid flow problems in porous media. Several networks affect the cerebral fluid transportation, such as high pressure arterial network systems, lower pressure arteriole/capillary network, extracellular/CSF network and a venous network. The mathematical model of the transportation under the influences of these network gives a system of partial differential equations. The numerical simulation of the system on the triangular discretized grid for the real brain takes large amount of time to obtain the result. Using the mesh splitting package and parallel processing, we hope to speed up the cerebral fluid transport simulation and facilitate the study of hydrocephalus or other related cerebral diseases.

Session 3

Tsung-Hui (Alex) Huang 黃琮暉 (National Tsing Hua University) (speaker)

Tsung-Yeh Hsieh, and Cheng-Chun Yang

Stabilized Meshfree and Physics Informed Neural Networks Formulations for Advection Dominated Flow Problems

Abstract:

Eulerian-described partial differential equations, such as the advection-diffusion equation or Navier-Stokes equation, exhibit numerical instability under strong advection when the conventional Bubnov-Galerkin method is employed. Such instability can be circumvented under Petrov-Galerkin formulations [1], and meshfree formulations like reproducing kernel particle method (RKPM) [2] can effectively enhance the local smoothness and approximation accuracy. However, immature domain integration could still lead to suboptimal convergence and hourglass-like instability. Such inaccuracy and instability will be amplified under the strong advection and is commonly seen in meshfree formulations [2]. This study proposes a variationally consistent integration method and gradient-type stabilization based on the advection-diffusion equation to enhance the accuracy and coercivity of the system [3]. In addition, a machine learning physics-informed neural network method (PINNs) [4] is introduced. By revisiting the stabilized Petrov Galerkin formulation, one can arrive at a collocation format for PINNs with better robustness in modeling flow problems with strong advection and biased data. Various advection-dominated fluid flow problems are investigated to verify the effectiveness of the proposed methods.

References:

[1] Thomas JR Hughes. "Multiscale phenomena: Green's functions, the Dirichlet-to-Neumann formulation, subgrid scale models, bubbles and the origins of stabilized methods." Computer Methods in Applied Mechanics and Engineering 127.1-4 (1995): 387-401.
[2] Jiun-Shyan Chen, Michael Hillman, and Sheng-Wei Chi. "Meshfree methods: progress made after 20 years." Journal of Engineering Mechanics 143.4 (2017): 04017001.
[3] Tsung-Hui Huang. "Stabilized and variationally consistent integrated meshfree formulation for advection-dominated problems." Computer Methods in Applied Mechanics and Engineering 403 (2023): 115698.
[4] Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations." Journal of Computational Physics 378 (2019): 686-707.

Jen-Hao Chen 陳人豪 (National Tsing Hua University)

Neural-network-based Method for Computing Multiple Excited States of the Static Schrodinger Equation

Abstract:

A neural-network-based method is proposed to solve the multiple excited-state energies and corresponding wave functions of the static Schrodinger equation. The neural network models are trained by minimizing a specific loss function, in which the main components are the energy and deflation terms. The loss function is trained to obtain the minimum value of the energy term which is the desired energy level. Moreover, the deflation term transforms the energies of all computed states to the appropriate shift, and enables us to compute the next consecutive state. The results show that the accuracy and efficiency of proposed method outperforms other neural-network-based solvers.

Takahiro Katagiri (Nagoya University)

State-of-the-Art Explainable AI for Auto-tuning Function on Numerical Software

Abstract:

Recently, AI is adapting to auto-tuning (AT) for numerical software. By utilizing AI technology, it is easy to establish AT function for performance tuning on numerical software, such as adjustment of blocking factors, selection of the best implementation, and several numerical parameters for iterative solvers. However, it is difficult to show correctness for obtained AI model. Adaptation of technology of explainable AI (XAI) is one of solutions to solve this problem. In this presentation, several scenarios for adapted XAI to AT function are presented.


Session 4

Katsuhisa Ozaki (Shibaura Institute of Technology)

Recent Progress of BLAS-Based Accurate Numerical Algorithms for Matrix Multiplication

Abstract:

We have proposed accurate algorithms based on gemm in BLAS for matrix multiplication. 

Recently, it has been called the Ozaki scheme and has been applied not only to high-precision computation but also to interval arithmetic and iterative refinement methods for eigenvalues and singular values. 

In this presentation, we report on the recent progress of the Ozaki Scheme, which includes unbalanced matrix splitting, rounding error analysis, and numerical experiments. 

Finally, we introduce the prospects for automatic tuning.


Takeshi Terao (RIKEN R-CCS)

Iterative Refinement for A Subset of Eigenpairs of A Real Symmetric Matrix

Abstract:

Numerical computation is used in many scientific fields, and eigenvalue decomposition plays a significant role. Here, highly accurate eigenpairs are required in some areas.

We will discuss a new method for iterative refinement for the eigenpair of a real symmetric matrix. The proposed method refines the accuracy for a subset of eigenpairs without a full eigenvector matrix. Numerical examples illustrate well convergence of the residual, and the refinement can be performed without a computer with large memory.


How-Wei Chen 陳浩維 (National Central University)

High Accuracy FDM Viscoelastic Simulation – K-space Asymmetrical Factorization and Fractional Spatial Derivatives, Time-Space and Fourier Domain Computations

Abstract:

Designing accurate and efficient wave propagation engines is vital for seismic modelling and imaging. The pseudospectral method (e.g., Gazdag, 1981; Fornberg, 1987; Chen, 1996) provides spatial dispersion-free wavefields. However, small time steps must be adopted to ensure the simulation stability and temporal accuracy. To deal with the issue, many enhanced spectral-like methods are proposed for wave propagation. The k-space method (e.g., Tabei et al, 2002; Firouzi et al., 2012 ) is one of these enhanced strategies. 

The k-space method stems from the scattering research (Bojarski, 1982) and then is introduced to solve many practical problems in ultrasonic (Tabei et al., 2002), biomedical (Cox et al., 2007) and geophysical (Song et al., 2012; Chen et al., 2016) applications. Compared with second-order k-space formulation, the first-order k-space formulation can be readily solved by the staggered-grid configurations and possesses better numerical stability and less simulation artefacts. The existing first-order k-space method is highly efficient and accurate for the wave propagation simulation in homogeneous media. However, when it comes to highly heterogeneous media in geophysical problem, each first-order derivative in the k-space formulation turns out to be a computationally intensive space-wavenumber mixed-domain operator. Removing the discretization errors in temporal derivative can be achieved through eigenvalue decomposition and solution to the matrix differential equations. In this abstract, we analyze and summarize the computationally intensive problem of the conventional first-order k-space method as symmetrical factorization of wavenumber-time (k-t) domain wave propagator. And then, we propose a new framework by asymmetrical factorization of the wave propagator to address the problem. The error compensation operators can simultaneously correct the discretization errors of different types of temporal derivatives and it is the wave mode adaptive approach. The mixed-domain operators are effectively represented by the low-rank variable operator length computations. The theoretical and numerical analyses validate that, compared with conventional k-space method, our new k-space method preserves the modelling accuracy while significantly boost the simulation efficiency.

Session 6

Hiroyuki Takizawa (Tohoku University) (speaker)

Jin Yifan, Mulya Agung, Keichi Takahashi, and Yoichi Shimomura

A Task Mapping Method for Heterogeneous Muti-core NUMA Systems

Abstract:

Optimal mapping of tasks to processor cores is non-trivial. Modern processors could have different kinds of processor cores to achieve high performance and enegy efficiency. Besides, in the case of NUMA systems with such processors, memory access performance could change depending on the locations of cores and memory devices. Existing task mapping methods are ineffective on such systems because they do not simultaneously consider multiple performance factors caused by the heterogeneity in memory access and core performance. In this talk, therefore, we propose a new mapping method with two task mapping priority options: the memory-aware priority option (MPO) and the heterogeneity-aware priority option (HPO). A priority option switching mechanism (POSM) selects the appropriate priority option for the combination of a system and an application by analyzing their characteristics. Compared with other methods that do not switch mapping priorities, the proposed method achieves overall performance improvement when dealing with a set of applications with different characteristics.


Chin-Tien Wu 吳金典 (National Yang Ming Chiao Tung University)

UAVs Detection and Tracking by Feature Enhanced Yolo with Kalman Filtering

Abstract:

Unmanned Aerial Vehicles defense becomes a hot topic due to the ongoing Ukraine-Russia War and China’s constant threats to Taiwan. Drones tracking and countermeasures inevitably play a crucial role in future warfare. In this talk, we propose a feature-enhanced Yolo network to improve the accuracy of small moving objects detection. To achieve real-time tracking, Kalman filter (KF) is applied to reduce the search window. A structure-preserving algorithm is proposed to estimate the covariance in KF. Furthermore, FPGA is employed to accelerate the computation of the covariance estimation. Experimental results will be shown to demonstrate the effectiveness of our approach.

ChungGang Li 李崇綱 (National Cheng Kung University)

Numerical Estimation of Indoor Air Quality in Crowded Environments Based on a Compressible Flow Solver Using Supercomputer

Abstract:

Estimating the indoor air quality in crowded environments to prevent the spread of viruses and other infections become an urgent need since the Coronavirus pandemic in March 2020. In this study, a numerical framework to estimate indoor air quality using computational fluid dynamics (CFD) was developed. The framework applies an immersed boundary method to appropriately assign flow and thermal conditions, such as those on air conditioners, ventilation systems, furniture, and the human body. Additionally, the compressibility of air due to the natural convection effect is also taken into consideration to accurately obtain the flow and thermal fields. Mesh generation is based on a hierarchical structure named CUBE, enabling calculations to be performed on a supercomputer to save computational time. Several cases will be presented in this talk. Results indicate that indoor air quality can be qualitatively and quantitatively estimated using the current framework.

Student Speakers 

Session 5

Ji Qi (Kyushu University) (speaker)

Satoshi Ohsima (Kyushu University)

Kenji Ono (Kyushu University)

Performance Evaluation of AoS and SoA for Incompressible Fluid Simulation on GPUs

Abstract:

Computational fluid dynamics codes make use of large arrays to represent multi-dimensional vector fields. On GPUs, memory accesses can be optimized using the memory coalescing technique, however, due to the three-dimensional nature of most CFD problems, it is impossible to arrange all memory accesses into continuous blocks.

In this study, we evaluate the performance of using AoS and SoA to represent multi-dimensional vector fields in an in-house incompressible fluid simulation code on multi-GPUs, and estimate the influence of these data structures on different operations within the simulation.


Hao-Lun Yeh 葉浩倫 (National Central University) (speaker)
Shu-Chih Yang (National Central University)
Koji Terasaki (RIKEN R-CCS)
Takemasa Miyoshi (RIKEN R-CCS)

Including Observation Error Correlation for Ensemble Radar Data Assimilation and its Impact on Heavy Rainfall Prediction

Abstract:

An assumption of uncorrelated observation errors is commonly adopted in conventional data assimilation. For this reason, high-resolution data are usually resampled with superobbing or data-thinning strategies. However, not only do these strategies diminish the advantages of high temporal and spatial resolutions that can provide essential details in convection development, assimilating high-resolution data, such as radar radial wind, without considering observation error correlations can also lead to overfitting and thus degrade the performance of data assimilation and forecasts. This study uses a radar ensemble data assimilation system that combines the Weather Research and Forecasting model and Local Ensemble Transform Kalman Filter (WRF-LETKF) to assimilate radar radial wind and reflectivity data. We further include the error correlation of the Doppler radar radial wind and reflectivity in the WRF-LETKF radar assimilation system, and examine its impact on the accuracy of convective-scale analysis and short-term precipitation prediction in Taiwan, for heavy rainfall events with different characteristics. 

The horizontal error correlation scale of radial wind and reflectivity ranges from 15 to 25 km, depending on the type of precipitation events. The introduction of observation error correlation for radar radial data assimilation produces more small-scale features in wind and hydrometer analysis corrections compared to the experiment using an independent observation assumption. Consequently, the modification of wind corrections leads to stronger convergence accompanied by higher water vapor content, which enhances local convections. The additional small-scale hydrometer corrections improve the location and intensity of the reflectivity. This results in more accurate forecasts of short-term precipitation. For local convections, including the reflectivity observation error correlation additionally can better capture the rapidly changing convection.


Duc Quoc Huynh 黃德國 (National Central University)

An Accelerated Quasi-Newton Method Based on Partial Modification with a Secant-like Diagonal Approximation for Nonlinear Least-squares Problems

Abstract:

Nonlinear least-squares (NLS) problems plays an important role in many computational science and engineering applications such as NLS image classification, parametric identification,neural networks training, and travel time inversion of seismic data... In this paper, we propose and study new variants of the quasi-Newton method by partially modifying secant-like diagonal approximation (QN-SLDA) for solving NLS problems. In addition, an accelerated version of QN-SLDA, referred to as AQN-SLDA, is also considered. In AQN-SLDA, we rescale the search direction after the first backtracking line search technique is applied to produce a more aggressive step to increase the objective function value reduction. 

The concept of the proposed methods is simple and easy to implement. We prove the proposed methods that are globally convergent under some appropriate assumptions. In addition, due to the trade-off between the number of iterations for convergence and the overhead needed in AQN-SLDA, the benefit of the acceleration step is more evident for the largest-sized problems.


Session 7

Shiyao Xie (Kyushu University) (speaker)

Kenji Ono (Kyushu University)

Finding the Best Way to Split the Error Threshold in Parallel Tensor Train Decomposition

Abstract:

Tensor Train Decomposition (TTD) is a relatively new method for decomposing tensors, which are multidimensional arrays, into a compact format. Among the many researches that have been conducted in recent years using modern distributed systems to speed up the computation of TTD, we have proposed an error-bounded algorithm, PTTD, which has been proven by experimental results to be the most scalable parallel algorithm for TTD.  However, in PTTD we approximate the given tensor in multiple rounds. To ensure that the result meets the given error threshold, we need to split the given error threshold to determine the error threshold of each round. Since there are many ways to split it, in this study we conducted experiments to find the best way that produces the smallest result.


Yuki Uchino (Shibaura Institute of Technology) (speaker)

Katsuhisa Ozaki (Shibaura Institute of Technology)

Performance Evaluation of Iterative Refinement for Singular Value Decomposition on a Supercomputer

Abstract:

This study considers refinement algorithms for the singular value decomposition of a real matrix. Ogita and Aishima proposed an iterative refinement algorithm for singular value decomposition that is constructed with highly accurate matrix multiplications carried out six times per iteration. Since that algorithm is based on Newton's method, that converges quadratically if an initial guess is moderately accurate. Recently, we showed that Ogita and Aishima's algorithm can be run with highly accurate matrix multiplications carried out four times. Also, we proposed two algorithms constructed with highly accurate matrix multiplications carried out two times. In the presentation, we will show the performance evaluation of those algorithms on the supercomputer Fugaku.


Ivan Luthfi I. 盧斯非 (National Central University)

A Multiscale Finite Element Method with Adaptive Bubble Function Enrichment for the Helmholtz Equation

Abstract:

The Helmholtz equation is a mathematical model used to explain various physical phenomena involving wave propagation and scattering. Solving this equation numerically presents challenges, including a lack of robustness, known as the pollution effect, and difficulties finding an efficient iterative solver for unbounded exterior domains with increasing wavenumbers. This research proposes a new framework for solving such problems using a multiscale finite element method (MsFEM) with adaptive bubble function enrichment (MsFEM_bub). The MsFEM_bub method uses adaptive bubble functions to improve accuracy in terms of approximation and it presents the efficiency and robustness as an iterative solver. Numerical experiments for various wavenumbers indicate the robustness and efficiency of the method.