2026 Titles and Abstracts

2026 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing

Keynote Speaker

Hung-Chi Kuo 郭鴻基 (National Taiwan University, Taiwan)

Title: Data-driven AI Weather Prediction in Taiwan

Abstract:

Climate science seeks to understand the variability and dynamics of Earth’s atmosphere to anticipate both long-term climate change and short-term severe weather events. Central to this effort is the challenge of predicting time-dependent variability, often studied through the Liouville equation and related dynamical frameworks. Traditional numerical models, though scientifically rigorous, are constrained by high computational demands that limit their ability to fully sample complex processes. This restricts exploration of large-scale climate scenarios and reduces the accuracy of severe-weather predictions.

Artificial intelligence offers efficient computational tools that can advance the study of atmospheric dynamics, opening new pathways to breakthroughs in climate science and weather prediction. The key to transitioning from numerical models to AI-based approaches lies in AI’s ability to infer dynamical principles from data and generate physically consistent variability, rather than relying solely on statistical pattern matching. Validating this capacity requires rigorous experiments and attribution studies to ensure AI can capture fundamental atmospheric laws.

Our research examines frontogenesis in a zonally uniform winter atmosphere, dynamical responses to tropical diabatic heating, vorticity responses linked to vertical heating modes, and the breakdown of the Intertropical Convergence Zone (ITCZ). These studies show that AI models can move beyond statistical emulation to reproduce physically reasonable atmospheric dynamics. Finally, we highlight ongoing efforts to revolutionize Taiwan’s forecasting by developing a high-resolution regional AI-based system to predict typhoons and heavy rainfall, marking a turning point in the integration of AI and climate science.

Invited Speakers

Takeshi Fukaya (Hokkaido University, Japan)

Title: Tall-Skinny QR Factorization with Column Pivoting via a Cholesky QR algorithm

Abstract:

We consider computing the QR factorization with column pivoting for tall and skinny matrices, a problem that arises in various applications such as low-rank approximation. Recently, we have developed an algorithm that belongs to the class of Cholesky QR type algorithms, which are known to be well suited to modern computer architectures. In this talk, we outline the proposed algorithm and highlight the key ideas behind its design. We also present performance results obtained in both single-node and distributed-memory parallel environments. This is joint work with Yuji Nakatsukasa and Yusaku Yamamoto.

Toshiyuki Imamura (RIKEN, Japan)

Title: Impact of MxP on Numerical Linear Algebra

Abstract:

The MxP encompasses approaches that integrate various levels of computational precision as needed to simulate both low- and high-precision calculations. The primary objective is to manage the computational workload to attain the desired arithmetic precision by selecting appropriate computational accuracy and addressing complex optimization tasks, thereby lowering overall computational expenses. In the presentation, the Ozaki method was employed within a GPU-based dense-matrix numerical computation library, establishing a transparent cost model for balancing computational accuracy and time. Additionally, an engineering example of next-generation optimization of numerical libraries utilizing MxP is provided to demonstrate the efficacy of the method.

Takahiro Katagiri (Nagoya University, Japan)

Title: Automatic Generation of HPC Codes: Current Achievements and Quantum-Oriented Future Directions

Abstract:

Automatic generation of high-performance computing (HPC) codes has become an important approach for improving productivity and portability on increasingly complex computing architectures. This presentation introduces our recent achievements in automatic HPC code generation within the HPC-GENIE project, focusing on AI-driven methods that transform high-level problem descriptions into optimized parallel implementations. We present practical results obtained on modern HPC systems, including performance portability across heterogeneous CPU/GPU environments. Building on these achievements, we explore future research directions toward quantum-oriented computing. In particular, we outline how automatically generated HPC codes can be extended to interact with quantum technologies such as quantum annealing, enabling hybrid quantum–classical workflows for large-scale optimization and scientific applications. This forward-looking perspective highlights the potential of automatic code generation as a unifying framework bridging classical HPC and emerging quantum computing paradigms.

Koki Masui (The University of Osaka, Japan)

Title: Fast and robust preconditioning techniques for iterative solvers of complex-valued linear systems

Abstract:

Iterative methods are widely used to solve large-scale linear systems, and incomplete Cholesky (IC) preconditioning is often applied to improve convergence. IC preconditioning involves parameters such as an acceleration factor, whose value significantly affects convergence behavior. In previous studies, real-valued acceleration factors have conventionally been used even for complex-valued problems. In this study, we demonstrate that complex-valued acceleration factors can further enhance convergence. Numerical experiments are presented to demonstrate the effectiveness of the proposed method.

Kengo Nakajima (The University of Tokyo, Japan)

Title: Challenges towards FugakuNEXT: From the Perspective of Application Development and Programming

Authors: Kengo Nakajima, Takashi Shimokawabe, Yohei Miki, Kazuya Yamazaki (The University of Tokyo)

Abstract:

In August 2025, RIKEN announced that FugakuNEXT, scheduled for operation in 2029, will adopt Fujitsu’s Monaka-X CPU together with NVIDIA GPUs—a major shift from previous National Flagship Systems in Japan, such as the K computer and Fugaku, which lacked accelerators. This transition reflects a broader trend toward GPU-based architectures in Japan’s HPC ecosystem. For example, Miyabi, jointly deployed by the University of Tokyo and the University of Tsukuba in January 2025, comprises Miyabi-G with 1,120 NVIDIA GH200 units and Miyabi-C with Intel Xeon CPUs, with Miyabi-G delivering 98% of total performance. Global carbon-neutral initiatives and rising energy costs make GPUs—offering superior performance per watt—an inevitable choice. The key challenge lies in migrating applications to GPUs. Compared to a decade ago, GPU programming environments have advanced significantly. Codes parallelized for CPU’s using OpenMP and MPI can now be ported to NVIDIA GPUs via OpenACC or Standard Parallelism, while Unified Memory technology simplifies CPU-GPU data access. To accelerate this transition, the Advanced HPC-AI R&D Support Center (HAIRDESC) lead by RIST was launched in November 2025 under a MEXT initiative. Collaborating with supercomputing centers of nine national universities in Japan, RIKEN, NVIDIA, and AMD, HAIRDESC leads nationwide efforts to enable efficient GPU adoption. This talk will explore these challenges and strategies for the FugakuNEXT era.

Takeshi Nanri (Kyushu University, Japan)

Title: Study on the real effect of computation-communication overlapping with non-blocking collective communications

Abstract:

Collective communication, which appears frequently in various parallel applications, is one of the most significant causes of scalability degradation in parallel computing, as the time required for it increases with the size of the computers, in some cases even more than linearly.

NBCs (Non-Blocking Collective communications) are expected to be a means to overlap this collective communication with computation and hide the communication time. However, the use of NBCs is currently limited because programmers do not have sufficient information about the usage and effect to modify their algorithms to overlap communication and communication. This talk evaluates several NBC implementation techniques available on standard clusters. We go beyond the conventional communication overlap ratio, reporting results that also account for the impact of overlap on actual execution performance.

Satoshi Ohshima (Kyushu University, Japan)

Title: Optimization of GEMM using AMX, and can Code Generative AI generate its optimal code?

Authors: Satoshi Ohshima, Shunsuke Nakano, Yusuke Endo

Abstract:

GEMM is a crucial computation used in various calculations, and in recent years, implementations utilizing the Tensor Core on GPUs have garnered significant attention.

In this work, we focus on GEMM using AMX on Intel CPUs. We optimized register usage and achieved faster GEMM than existing OpenBLAS implementations. At the same time, we are also interested in implementing HPC programs using code generative AI. Therefore, we are investigating whether our GEMM implementation can be generated by code generative AI and, if so, what steps would be required. In this talk, we will report the latest findings from our investigation.

Kenji Ono (Kyushu University, Japan)

Title: Accelerating Symbolic Regression via LLM-Guided Genetic Programming

Abstract:

Heuristic model discovery has been effectively performed using evolutionary computation, particularly genetic programming (GP) combined with symbolic regression. In GP, guidelines based on physical constraints and domain knowledge can be effectively incorporated; however, the search space remains vast, and obtaining promising candidate solutions often requires substantial computational time. By leveraging large language models (LLMs) for model generation and selection in evolutionary computation, it becomes possible to utilize pre-trained knowledge as well as information retrieved through retrieval-augmented generation (RAG), potentially enabling more efficient convergence to high-quality models. In addition, delegating the control flow of evolutionary computation to LLMs offers the flexibility to advance the computation in an interactive manner. In this talk, we present a case study in which LLM-assisted GP is applied to the discovery of wake models for wind turbines.

Katsuhisa Ozaki (Shibaura Institute of Technology, Japan)

Title: Emulation Methods for Matrix Multiplication and Their Applications

Abstract:

We proposed the Ozaki-I Scheme and Ozaki-II Scheme as emulation methods for matrix multiplication that exploit low-precision arithmetic. These methods have attracted significant attention as alternatives to, or accelerators for, double-precision arithmetic on GPUs. In this presentation, we introduce these emulation schemes and report their performance when applied to numerical linear algebra problems, including LU decomposition, Cholesky decomposition, and QR decomposition. This study is joint work with Yuki Uchino and Toshiyuki Imamura.

Tomohiro Suzuki (Yamanashi University, Japan)

Title: Parallel preconditioning for ICCG based on QUBO formulation

Abstract:

This study proposes a QUBO-based framework for parallel preprocessing in the Incomplete Cholesky Conjugate Gradient (ICCG) method. The multi-coloring (MC) method is formulated as a graph coloring problem using quadratic unconstrained binary optimization (QUBO) and evaluated in terms of reduction in the number of colors and parallelism. In addition, the block construction process of the block multi-coloring (BMC) method is formulated as a QUBO problem, and the characteristics of the generated blocks are analyzed. Based on observations from QUBO-formulated MC and BMC, this study aims to explore how QUBO-based preprocessing may influence the trade-off between parallelism and convergence in ICCG.

Hiroyuki Takizawa (Tohoku University, Japan)

Title: Use of LLMs for semantic equivalance verification of HPC codes

Abstract:

Semantic equivalence verification of source code is essential for software performance auto-tuning, but remains underexplored for High Performance Computing (HPC) codes, especially Fortran. We thus investigate the practicality of using large language models (LLMs) for semantic equivalence verification of HPC codes. Our preliminary evaluation shows that the accuracy of equivalence verification improves if an LLM is used as just a feature extracter while supervised classifiers are used to check the equivalance. Leveraging LLM latent representations and simple ensembles significantly enhances semantic equivalence verification for HPC codes, providing practical guidance for refactoring support, regression testing, and bug detection in scientific software.

Yuki Uchino (RIKEN, Japan)

Title: A Library for Emulating Matrix Multiplication with INT8 Matrix Engines

Abstract:

Modern computing architectures feature low-precision matrix multiplication units that achieve substantially higher throughput than their high-precision counterparts. Motivated by this architectural trend, the emulation of high-precision matrix multiplication using low-precision hardware has attracted significant interest in the high-performance computing community.

This talk presents a library for emulating single- and double-precision matrix multiplication with INT8 matrix engines. The proposed emulation outperforms the standard GEMM routines in cuBLAS and hipBLAS on modern GPUs.

Hong-Bin Chen 陳宏斌 (National Cheng Kung University, Taiwan)

Title: Tackling the issues in quantum steering with quantum machine learning approaches

Author: Hong-Bin Chen
(Department of Engineering Science, National Cheng Kung University, Taiwan)
(Center for Quantum Frontiers of Research & Technology, NCKU, Taiwan)
(Physics Division, National Center for Theoretical Sciences, Taiwan)

Abstract:

Quantum steering has been proven to be a unique quantum correlation sandwiched between Bell nonlocality and quantum entanglement. Due to its fundamental importance, quantum steering has been studied extensively. To demonstrate steerability, one relies on a particular resource referred to as steerable assemble on one side of a two-party system. However, it is generically unclear how to reach a maximally steerable assemblage from a bipartite quantum state. For this purpose, one must optimize over all possible measurement settings, which constitute a hierarchical structure. On the other hand, in light of the rapid development of quantum computing technology, quantum machine learning (QML) has emerged as a field with a promising potential in demonstrating quantum advantage. Here we leverage the power of several different kinds of QML algorithm to solve the difficulties in the theory of quantum steering. We first answer the question of the minimal number of observables required for Alice to construct steerable assemblage on Bob’s side, namely the hierarchy of steering measurement setting. By further constructing a computational protocol, we can generate the necessary datasets. We then train the kernel-based QML models to infer the hierarchy. We find that the QML models demonstrate a potential to surpass the classical counterparts for solving such highly complicated problems, revealing a practical quantum advantage. To answer the most steerable assemblage achievable for a bipartite state, we adopt the hybrid quantum-classical neural networks (HQCNNs). We also benchmark their performance against traditional artifical neural networks (ANNs). We find that the HQCNNs can achieve a comparable performance against ANN with a substantially fewer number of trainable parameters, leading to computational-resource-efficient learning algorithms. Additionally, based on the physics of quantum steering, we encode the states to be recognized into five different types of features. This helps us identify the most compact characterization of Alice-to-Bob steerability, which is Alices regularly aligned steering ellipsoid. We then apply the well-trained models to predict the hierarchy for three specific families of states.

[1] H.-M. Wang, H.-Y. Ku, J.-Y. Lin, and H.-B. Chen, “Deep learning the hierarchy of steering measurement settings of qubit-pair states,” Commun. Phys. 7, 72 (2024).

[2] Z.-L. Tsai, H.-M. Wang, and H.-B. Chen, “Learning the hierarchy of steering measurement settings of qubit-pair states with kernel-based quantum models,” New J. Phys. 27, 094502 (2025).

Lu-hung Chen 陳律閎 (National Chung Hsing University, Taiwan)

Title: Streaming multivariate functional principal component analysis with application to AI-driven weather forecast

Abstract:

Artificial intelligence (AI) weather forecasting models are a major focus in meteorology, but their training pipelines are increasingly constrained by the storage and bandwidth required to handle massive datasets. To reduce the transmission of training data while preserving the dominant variability of global meteorological fields, we propose a new data-preprocessing method, Streaming Multivariate Spherical Functional Principal Component Analysis (Streaming Multivariate Spherical FPCA), for data compression. Our approach (1) expands spherical-domain functional data in spherical harmonics, thereby respecting Earth’s spherical geometry; (2) uses a streaming algorithm to update the decomposition online with improved memory efficiency, avoiding full-batch storage and repeated dataset passes; and (3) extends the multivariate FPCA framework of Happ and Kurz (2018) to the streaming setting so that multiple meteorological variables can be compressed jointly while retaining their cross-variable correlation structure.

Tsung-Hui (Alex) Huang 黃琮暉 (National Taiwan University, Taiwan)

Title: Physics-Guided Machine Learning for Modeling Multiscale Systems

Abstract:

Multiscale physical systems commonly exhibit complex, nonlinear material responses that are difficult to capture using conventional constitutive models while remaining computationally tractable across scales. This talk presents a physics-guided machine learning framework for constructing effective surrogate constitutive models in multiscale systems by combining ad hoc neural architectures with mechanistic constraints. Two representative applications are discussed: biomechanics and graphene mechanics.

In the first case, a physics-augmented neural network (PANN) is developed by coupling constitutive neural networks with progressive damage mechanics via Gumble-Softmax regularization for modeling softening effect in meniscal root. Macroscopic continuum degradation is computed via the accumulation from discrete micro-crack information, which can be learned via PANN. The learned surrogate model is embedded within a meshfree solver to simulate knee joint mechanics under large deformation, enabling robust prediction of damage evolution while preserving physical consistency.

In the second example, we focus on graphene deposition, an important material widely used in semiconductor processes, where molecular dynamics simulations are leveraged to inform continuum-scale modeling through Constitutive Artificial Neural Networks (CANNs). Generalized invariants are introduced to encode lattice symmetry, anisotropy, and thermodynamic constraints, enabling accurate learning of free-energy functions from atomistic data. The resulting surrogate constitutive laws are integrated into finite element shell formulations to predict wrinkle formation and interfacial defects at micron scales.

These examples demonstrate how physics-guided neural architecture provides an effective bridge between atomistic, mesoscale, and continuum descriptions. By embedding physical priors directly into learning architecture, the proposed approach achieves improved extrapolation, interpretability, and computational efficiency, offering a general paradigm for surrogate modeling in multiscale scientific computing.

Bing-Ze Lu 呂秉澤 (National Chung Cheng University, Taiwan)

Title: Spectral-Bias-Aided Multilevel Neural Network Methods for Solving Implicit Boundary Integral Equations

Abstract:

In this talk, I will introduce a novel yet conceptually simple framework for training neural networks to solve implicit boundary integral equations (IBIEs).

Boundary integral equations (BIEs) are widely used for solving elliptic partial differential equations, such as the Poisson and Helmholtz equations. Conventional BIE formulations require explicit parameterizations of the computational boundary, which can be cumbersome for complex geometries. To address this issue, Richard Tsai (2012) proposed the implicit boundary integral method (IBIM), which reformulates the boundary integral equation in a tubular neighborhood of the surface. While IBIM significantly simplifies the construction of boundary integral equations, solving the resulting discretized systems remains computationally expensive and often dominates the overall cost.

Motivated by the universal approximation capability of neural networks, we propose to represent the surface potential using a neural network and reformulate the IBIM as a least-squares optimization problem. In this framework, solving the boundary integral equation is transformed into the problem of optimizing the weights and biases of the neural network.

To further accelerate training, we introduce a multilevel training strategy that explicitly exploits the spectral bias of neural networks. Through extensive numerical experiments, we demonstrate that the proposed multilevel approach achieves a 4–5× speedup compared to single-level training. We also analyze the training dynamics through the Neural Tangent Kernel (NTK) perspective to elucidate how spectral bias and kernel structure contribute to the observed acceleration.

Finally, we present numerical experiments for solving exterior Helmholtz problems, where conventional solvers require more than 12,000 seconds, while our proposed neural network approach achieves comparable accuracy in approximately 200 seconds, demonstrating substantial computational gains.

Ka-Lok Ng 吳家樂 (Asia University, Taiwan)

Title: Exploring the Potential Advantages of Quantum Machine Learning in Biomedical Research

Authors: Aninda Astuti, Tai Yue Li, Venugopal Reddy, Mekala Ezra B. Wijaya, Ka-Lok Ng

Abstract:

Quantum Machine Learning (QML), which combines quantum computing with modern machine learning techniques, has emerged as a promising approach for addressing computationally intensive problems. In this presentation, I will review recent advances in quantum computing and present a biomedical study involving NGS data as well as imaging from a cancer cohort that demonstrates evidence of quantum advantage.

Using the Neural Quantum Embedding (NQE) framework, we show that Quantum Support Vector Classifiers (QSVC) and Quantum Neural Networks (QNN) achieve higher classification accuracy than their classical counterparts (SVC and NN) in binary prediction tasks. To further improve computational efficiency, Tensor Network (TN) methods were integrated into the QSVC architecture. The resulting NQE+TN+QSVC model yields a higher F1 score while reducing training time relative to the baseline QSVC implementation. Overall, our work presents a robust strategy that enhances both predictive performance and simulation efficiency in biomedical applications.

Yang-Yao Niu 牛仰堯 (Tamkang University, Taiwan)

Title: Towards High-Resolution Simulation of High-speed Compressible Multi-Component Flows

Author: Yang-Yao Niu (Department of Aerospace Engineering, Tamkang University, Taiwan)

Abstract:

Responding to the need of aerospace industry, the flows with compressibility over high-speed transportation vehicles or through related internal flows are highly required to understand. The concerned applications in our studies include injection, spray and detonation engines. Therefore, the simulation of compressible multi-phase, multi-component chemical reactive flows is the main concern. Numerical difficulties arise due to both the physical model and the numerical method chosen for their mathematical approximations. In order to capture accurately the shock and detonation waves, interfaces and shear layers, the time-dependent five-equation Euler equations and Navier-Stokes Equations including the effects of chemistry are the selected numerical equations. Meanwhile, the Strang slitting is chosen for the time revolution the discretized governing equations for reducing the stiffness caused by interactions of multi-phase flows and the chemical reactions. In addition, the ATM type AUSM type approximated Riemann solutions are used to evaluate numerical fluxes on the cell interfaces. In this study, we further propose a new methodology of multi-fluid model to include Lagrangian method to simulate the atomization of liquid droplets The modified breakup model considering both the interface instability and the sudden deformation of a liquid spray induced by the compressible flows. Shock waves, recirculation zones and breakup processes of droplet particles are captured. The predicted droplet distributions at various operating conditions are also discussed. Satisfied agreement between simulations and measurements is shown in the current work.

Chia-Ho Ou 歐家和 (National Pingtung University, Taiwan　/　Tohoku University, Japan)

Title: Quantum-Inspired Optimization in Practice: Modeling and Deployment Across Diverse Applications

Abstract:

Quantum-inspired optimization has become a practical approach for addressing large-scale combinatorial problems across a range of real-world domains. This invited talk presents an experience-driven overview of how such methods are modeled and deployed in practice, with an emphasis on implementation workflows rather than algorithmic novelty. We discuss how optimization problems arising in transportation and logistics, healthcare and life sciences, agriculture, and resource allocation have been modeled using QUBO-based formulations. Drawing on hands-on experience, the talk describes the deployment of these models across heterogeneous computing platforms. Using a representative application to ground the discussion, we highlight practical observations from real deployments and outline current limitations and open challenges from a practitioner’s perspective.

Wei-Hsiang Wang 王威翔 (National Chung Hsing University, Taiwan)

Title: A Parallel CFD Framework for Two-way Coupled 6 DOF Fluid-Structure Interaction on Hierarchical Cartesian Meshes

Author: Wei-Hsiang Wang (National Chung Hsing University, Taiwan)

Abstract:

We present a high-performance parallel CFD framework for two-way coupled six-degree-of-freedom fluid-structure interaction with large rigid-body motions. The method combines a Building-Cube Method hierarchical Cartesian mesh with a sharp-interface immersed boundary method, enabling complex moving geometries on a fixed background grid while maintaining accurate boundary enforcement and load evaluation. STL surfaces are processed automatically; interface reconstruction in cut cells supports wall-condition imposition, and pressure/viscous tractions are integrated to drive a strongly coupled 6-DOF solver in a single global inertial frame. Mass properties and the initial inertia tensor are computed directly from STL polyhedra via divergence-theorem-based surface integrals, streamlining multi-body pre-processing. For scalability, cube-based domain decomposition with Morton Z-order ordering improves data locality and reduces MPI communication. Validation on canonical 6-DOF problems, including flow-induced rotation and self-propelled motion, demonstrates accurate predictions of time-dependent loads, attitude evolution, and wake dynamics.

Chin-Tien Wu 吳金典 (National Yang Ming Chiao Tung University, Taiwan)

Title: The finite time SDRE control for image based visual servo of UAV

Abstract:

In this talk, we shall introduce the formulation of the finite time state dependent Riccati control and its application on the tracking of the image based visual servo using UAV platform.

Yu-Ting Wu 吳毓庭 (National Cheng Kung University, Taiwan)

Title: TBD

Abstract:

TBD

Yung-Yu Zhuang 莊永裕 (National Central University, Taiwan)

Title: Programming and Compiling with AI assistants for High-Performance Scientific Computing

Abstract:

With a lot of program code available for training, generative AI has become a popular programming tool for code generation. Anyone may give AI natural-language prompts for generating functions or even whole programs automatically. However, ensuring the quality of generated code is crucial since there might be errors or misunderstandings. Here, the program quality includes not only correctness but also the semantics and performance, especially in the case of scientific computing. On the other hand, AI can also help in compilation and decompilation. For example, transforming existing program code to a new hardware platform or covering lost information to improve code readability. To help effectively use AI in programming and compiling, we are developing methodologies and tools for AI-assisted software development.

Student Speakers

Hayato Gotou (B4, Kyushu University, Japan)

Title: Determining Hyperparameters for Time-series Anomaly Detection Using Large Language Models

Abstract:

Anomaly detection plays a crucial role in industrial plants by predicting machinery failures and identifying substandard products. However, optimizing hyperparameters remains a significant challenge, particularly for practitioners without deep learning expertise. To address this, this study proposes a novel method using Large Language Models (LLMs) to determine these parameters. Furthermore, this approach is expected to enhance detection performance.

Koki Isobe (B4, Nagoya University, Japan)

Title: GPU Acceleration of Medical Image Representation Learning Models with Distributed Data Parallel, I/O Optimization, and AI-Assisted Development

Authors: Koki Isobe, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri (Nagoya University)

Abstract:

Self-supervised learning models such as Masked Autoencoders (MAE) and Contrastive Masked Autoencoders (CMAE) are effective for medical image representation learning but require significant computational resources when trained on large-scale datasets. In this study, we implemented multi-node, multi-GPU parallelization for two self-supervised image representation learning models using PyTorch Distributed Data Parallel on a GPU cluster environment. The same parallelization strategy was applied to different model architectures, and their training performance and scalability were comparatively evaluated. In addition, the parallelization implementations were independently developed using multiple large language models (LLMs), and the resulting implementations were compared in terms of correctness and performance behavior. To investigate performance characteristics, profiling was conducted to measure computation, communication, and data loading behavior under different execution settings. The profiling results were analyzed to identify dominant bottlenecks, differences between model architectures, and the impact of storage configurations on training efficiency. These evaluations clarify how both model architecture and implementation choices influence parallel efficiency in large-scale medical image representation learning.

Takanori Kotama (B4, Nagoya University, Japan)

Title: HPC-AutoResearch: An HPC-Native Framework for Autonomous LLM-Driven Experimentation

Authors: Takanori Kotama, Shun-ichiro Hayashi, Daichi Mukunoki (Nagoya U.), Rio Yokota (RIKEN), Satoshi Ohshima (Kyusyu U.), Tetsuya Hoshino, Takahiro Katagiri (Nagoya U.)

Abstract:

Large-language-model (LLM) driven scientific discovery systems promise to automate experiment design and execution, but typical implementations assume local, interactive environments and struggle with HPC constraints such as containerized runtimes, restricted networks, and reproducibility requirements. We present an HPC-focused implementation of HPC-AutoResearch that integrates Singularity-based split-phase execution with best-first tree search (BFTS) for parallel experimentation. The workflow decomposes each experiment into explicit phases - planning, install, coding, compile, and run - producing structured artifacts and logs that make failures diagnosable and runs auditable. Each worker operates in an isolated container, enabling GPU-aware parallelism while preserving per-run workspaces, configurations, and artifacts. The system also supports resource staging (datasets, templates, and documentation) and optional long-horizon memory to maintain context across branches. Downstream automation aggregates plots and generates LaTeX writeups, with an optional review step. This implementation enables reproducible, container-native LLM experimentation on HPC systems and provides a practical path to scaling autonomous scientific workflows under real cluster constraints.

Ryo Mikasa (B4, Nagoya University, Japan)

Title: Performance-Aware GRPO Training for Large Language Models in High-Performance Computing Code Generation

Authors: Ryo Mikasa, Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri (Nagoya U.)

Abstract:

Recent years have witnessed rapid advancements in code generation technology using large-scale language models (LLMs), with numerous training methodologies being proposed. Notably, recent developments include GRPO (Group Relative Policy Optimization) introduced by DeepSeek, which has significantly popularized reinforcement learning approaches for LLMs and has seen widespread application across various state-of-the-art LLM models. Conventional GRPO for code generation primarily focuses on evaluating code execution accuracy, thereby enhancing an LLM's ability to output correct code. However, in high-performance computing (HPC) code generation, code execution speed becomes equally critical in addition to code correctness. Therefore, we propose a specialized GRPO-based training method for LLMs tailored specifically for HPC code generation. Our approach involves executing the LLM's generated code on supercomputers, collecting benchmark results, and using these outcomes for evaluation and learning. This enables the LLM not only to produce correct results but also to empirically acquire knowledge about optimization techniques unique to the HPC domain. In this presentation, we demonstrate the extent to which LLMs trained using this methodology have improved their HPC code generation capabilities.

Hirotoshi Tamori (D2, Hokkaido University, Japan)

Title: Error Vector Sampling Based Subspace Correction Preconditioning for a Sequence of Nonsymmetric Linear Systems

Abstract:

This talk considers solving a sequence of linear systems with the same nonsymmetric coefficient matrix, a setting that can arise in time-dependent simulations and other applications. Such systems are commonly handled with Krylov iterative methods, but slow convergence often becomes a computational bottleneck. Although many preconditioners have been proposed, accelerating convergence is still challenging for nonsymmetric problems.

We present the error vector sampling based subspace correction (ES-SC) preconditioning method for nonsymmetric systems. ES-SC samples error vectors during the first solution process to identify components that hinder convergence, then algebraically constructs a subspace correction preconditioner that speeds up the remaining solution process in the sequence. Numerical experiments on a set of test matrices show that the ES-SC preconditioner reduces the number of iterations and total runtime in most cases.

Yi-Jhen Wu 吳怡臻 (Tamkang University, Taiwan)

Title: Robust Three Step Global Mechanism from Shock Tube Ignition Delays to ODW Initiation

Abstract:

Large-molecular hydrocarbon fuels exhibit complex chemical behavior, which poses challenges in modeling hypersonic propulsion systems. This study develops and validates a three-step global chemical kinetic mechanism specifically tailored for oblique detonation wave (ODW) simulations. The proposed mechanism is calibrated using shock-tube ignition delay data and validated against combustion temperatures and detonation velocities to ensure accuracy. Two-dimensional simulations under steady and unsteady hypersonic flow conditions are conducted to investigate the development of ODW initiation. The results indicate that the simplified mechanism reliably reproduces ignition delays, detonation onset, and ODW wave structures, while achieving an improvement in computational efficiency relative to skeletal mechanisms. These findings provide quantitative guidance for reliable oblique detonation initiation, establishing a practical framework for the design and optimization of hypersonic propulsion systems.

Poster Presentation

Jing-Liang Ku, Kuan-Hung Kao, Hsin-Ni Hung and Wei-Hsiang Chang (Tamkang University, Taiwan)

Title: Simulations of the Non-Equilibrium Chemical Reactive Flows of the Hypersonic Re-entry

Page updated

Google Sites

Report abuse