HPAI4S'25: HPC for AI Foundation Models & LLMs for Science
Milano, Italy, June 4, 2025
HPAI4S has been moved from room 3.1.7 to Aula De Donato (floor-0, the room beside the registration desk)
HPAI4S'25: HPC for AI Foundation Models & LLMs for Science
Milano, Italy, June 4, 2025
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) by introducing powerful AI systems capable of understanding and generating human-like text with remarkable fluency and coherence. These models, trained on vast amounts of data, are capable of performing a wide range of tasks: language translation, text summarizing, knowledge distillation, enabling researchers to navigate complex scientific literature more efficiently.
LLMs are only a starting point in unlocking the generative abilities of broader transformers to accelerate science: analyzing and plotting experimental data, formulating hypotheses, designing experiments, and even predicting promising research directions. To this end, modern transformers combine multi-modal data, leverage domain-specific representations, capture correlations through complex attention mechanisms (e.g., self-attention, cross-attention), compose specialized architectures (e.g., mixture of experts). They form the core of foundational models (FMs). The potential for architectural innovations has barely been tapped.
In a quest for more emergent behavior (advanced capabilities not explicitly trained for but emerging spontaneously due to the massive scale and exposure to vast amounts of data during training), FMs' scale and complexity continuously increase, requiring larger training infrastructures and drastically escalating their energy footprint. In addition, the sharp increase in popularity of these models and the complexity of their prompting generates extreme volumes of concurrent inferences that need to be delivered at high throughput.
A consequence of this trend for science applications is that FM training and inference face two major challenges. The first challenge is the democratization of FMs. The scale, cost, and time required to train FMs and run inferences is prohibitively expensive for small and medium institutions, both in academia and industry. The second challenge is the unprecedented scale, duration, and throughput of parallel executions required for training and inferences. This new context raises many open research problems and innovation opportunities for parallel computing.
This workshop will provide the scientific community a dedicated forum for discussing new research, development, and deployment of FMs at scale. Specifically, it aims to address high performance, scalability and energy efficiency of FMs through a combination of system-level and algorithmic aspects such as processing and curating the training data, efficient parallelization techniques (data, tensor, pipeline parallelism, multi-level memory management, redundancy elimination, etc.), effective data reduction approaches (for parameters, activation, optimization, gradients), low-overhead checkpointing and strategies to survive model spikes and other anomalies, fine-tuning and continual learning strategies, comprehensive evaluation and benchmarking, efficient batching, scheduling and caching of inference requests to serve a large number of users concurrently, strategies for prompt engineering and augmentation (e.g., RAG), applications to domain sciences.
Abstract. High performant storage and access of data is crucial in achieving high performance AI. This talk will focus on I/O optimizations targeting to improve AI as well as how AI is helping researchers to understand I/O performance bottlenecks. The talk will cover I/O requirements and patterns of AI applications and the current state of the art. The second part of the talk will go into the details of using AI technologies for understanding I/O performance bottlenecks and helping to fix them.
Bio. Suren Byna is a Professor in the Department of Computer Science and Engineering (CSE) at The Ohio State University (OSU). He was a Senior Computer Scientist at Lawrence Berkeley National Laboratory (LBNL), where he is now a Visiting Faculty Scientist. He leads the Innovative Data Technologies Lab at OSU. His research interests span across many topics in scientific data management and analysis. Some of these include parallel I/O, file and data management systems, I/O libraries, file formats, and metadata management. He also leads projects in the areas of data quality, cybersecurity and trustworthiness of data, and AI readiness of data.
Presenters are requested to share their slides using this Google Drive link.
The workshop will be located in Aula De Donato of Building 3.
Full papers (8 pages): 25 min presentation + 5 min Q/A
Short papers (6 or fewer pages): 17 min presentation + 3 min Q/A
🥇Best Paper: Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications
(Andre Merzky, Rutgers, the State University of New Jersey; Mikhail Titov, Brookhaven National Laboratory; Matteo Turilli, Rutgers, the State University of New Jersey; Ozgur Kilic, Brookhaven National Laboratory; Tianle Wang, Brookhaven National Laboratory; Shantenu Jha, Princeton Plasma Physics Laboratory)
🥈Best Paper Runner Up: Evaluating Expansion Memory for Optimizer State Offloading for Large Transformer Models
(Moiz Arif, Micron Technology Inc.; Avinash Maurya, Argonne National Laboratory; Sudharshan Vazhkudai, Micron Technology Inc.; Bogdan Nicolae, Argonne National Laboratory)
Is In-Context Learning Feasible for HPC Performance Autotuning?
(Thomas Randall, Clemson University; Akhilesh Bondapalli, Clemson University; Rong Ge, Clemson University; Prasanna Balaprakash, Oak Ridge National Laboratory)
Exploration of LLM Lossless Compression on Scientific Data
(Max Faykus, Clemson University; Luanzheng Guo, Pacific Northwest National Laboratory; Rizwan Ashraf, Pacific Northwest National Laboratory; Jan Strube, Pacific Northwest National Laboratory; Jon Calhoun, Clemson University; Nathan Tallent, Pacific Northwest National Laboratory)
Breaking Down LLM inference: A preliminary performance analysis of sparsified transformers
(Ioanna Tasou, NTUA; Petros Anastasiadis, NTUA; Panagiotis Mpakos, NTUA; Dimitrios Galanopoulos, NTUA; Nectarios Koziris, NTUA; Georgios Goumas, NTUA)
Imperfect Recognition: A Study of OCR Limitations in the Context of Scientific Documents
(Chinmay Sahasrabudhe, Sandia National Laboratories; Yang Ho, Sandia National Laboratories; Nick Winovich, Sandia National Laboratories; Sivasankaran Rajamanickam, Sandia National Laboratories)
Towards Orchestrating Agentic Applications as FaaS Workflows
(Shiva Sai Krishna Anand Tokal, Indian Institute of Science, Bangalore, India; Vaibhav Jha, Indian Institute of Science, Bangalore, India; Anand Eswaran, IBM Research Bangalore, India; Praveen Jayachandran, IBM Research Bangalore, India; Yogesh Simmhan, Indian Institute of Science, Bangalore, India)
Adaptive Protein Design Protocols and Middleware
(Aymen Alsaadi, Rutgers/Dept. of Electrical and Computer Engineering; Jonathan Ash, Rutgers/Institute for Quantitative Biomedicine; Mikhail Titov, Brookhaven National Laboratory; Matteo Turilli, Rutgers/Dept. of Electrical and Computer Engineering; Andre Merzky, RADICAL-Computing Inc; Shantenu Jha, Rutgers/Dept. of Electrical and computer engineering; Sagar Khare, Rutgers/Department of Chemistry and Chemical Biology)
We seek contributions that are related to the following topics. Papers that intersect with two or more topics are particularly encouraged.
Model Exploration: Model distillation, ablation methods, and compression methods to experiment with new model architectures and multi-modal data at scale for path-finding or edge deployment.
Data: Preparing Science data for use in AI models including but not limited to data reduction, sampling, filtering, curation, and deduplication.
Pretraining: Efficient stall/failure detection and recovery mechanisms such as checkpoint and restart, novel parallelism approaches, and efficient data pipelines to be used in pre-training.
Alignment, Fine-tuning, and Continual Fine-tuning: Re-reinforcement learning with human feedback, mitigating catastrophic forgetting.
Evaluation: Tools and techniques for scaling human and automated evaluation for AI in science using HPC and cloud at scale.
Inferences: Multi-tenancy and KV cache management, model instance management, batching of queries, balancing latency and throughput for inference.
Multi-modality: Software/Hardware techniques to combine text with multi-modal science data, especially from large-scale scientific simulations and instruments to enable reasoning between domains.
Reproducibility, Provenance, and Traceability: Tools, and approaches that enable tracking and reproducibility of experiments for AI.
Systems Software (e.g. Compilers, Schedulers, Drivers, and core libraries), Hardware (e.g. Networks, Accelerators, GPUs), Theory Modeling, and algorithms, used in Networking, Computing, and Storage for AI for Science during Inference, Pretraining, Fine Tuning, Alignment, Continual Learning, and Evaluation.
Other examples of AI for HPC and HPC for AI in scientific applications at scale.
Paper submission deadline: February 17th, 2025 AoE February 6th, 2025 AoE
Final notification: March 10th, 2025 AoE March 3rd, 2025 February 20th, 2025 AoE
Camera-ready papers: March 13th, 2025 AoE March 6th, 2025 AoE
Authors are invited to submit papers describing unpublished, original research. The workshop accepts full 8-page and short/work-in-progress 5-page papers, including references, figures, and tables. All manuscripts should be formatted using the IEEE conference-style template using 10-point font size on 8.5x11-inch pages. All papers must be in English. We use a single-blind reviewing process, so please keep the authors' names, publications, etc., in the text. Papers will be peer-reviewed and accepted papers will be published in the IPDPS workshop proceedings.
All papers should be uploaded to the IPDPS submission portal.
General Chair: Franck Cappello (Argonne National Laboratory)
General Vice-chair: Bogdan Nicolae (Argonne National Laboratory)
Program Committee Chair: Robert Underwood (Argonne National Laboratory)
Organizing Chair: Avinash Maurya (Argonne National Laboratory)
Moiz Arif (Micron Technology)
Javier Aula-Blasco (Barcelona Supercomputing Center)
Bibrak Qamar Chandio (Intel Corporation)
Nicholas Lee-Ping Chia (Argonne National Laboratory)
Neil Getty (Argonne National Laboratory)
Sandeep Madredidy (Argonne National Laboratory)
Robert Underwood (Argonne National Laboratory)
Rio Yakota (The Tokyo Institute of Technology)
Xiaodong Yu (Stevens Institute of Technology)
Kevin Assogba (Rochester Institute of Technology)