15th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures

FLEXSCIENCE 2025

Notre Dame, IN, USA, July 20, 2025

AGENDA

This year's edition of FlexScience is organized jointly with the FRAME workshop.

Venue: Morris Inn, room Hesburgh

Date: July 20th, 13h30 - 17h00

Papers have 20 min slots (15 min presentation + 5 min Q&A).

Preliminary program:

13h30 Opening - chair Alexandru Costan
13h40 Keynote - chair Bogdan Nicolae
- Michael E. Papka, Title TBA
14h30 Session 1 - FRAME - chair Alexandru Costan
- Rusty-Cracker: A Multi-core Connected Components Library in Rust, Davide Rucci, Daniele Sanpietro, Emanuele Carlini, Matteo Mordacchini, Patrizio Dazzi
15h00 Coffee Break
15h30 Session 2 - FlexScience - chair Kento Sato
- Towards a Federated Approach to Complex Digital Twins, H. Ahmed, D. Crawl, I. Altintas
- RAPTOR: Reconfigurable Advanced Platform for Transdisciplinary Open Research, H. Najafi, P. Poudel, K. Bahreini, J. Ibarra, F. Saeed, Y. Li, J. Obeysekera, J. Liu
- Building Flexible Physics-Informed Neural Networks with Fast Fourier Transform Analysis, R. Shehayib, J. Vap, P. Kogge
- Efficient and Cost-Effective HPC on the Cloud, A. Bhosale, L. Kale, S. Kokkila-Schumacher
16h50 Closing Remarks

keynote: Michael E. Papka (ANL)

Title: Rethinking Specialization: A Path Toward Agile and Unified HPC Facilities

Abstract:

As high-performance computing (HPC) enters a period of rapid transformation—driven by data-centric science, AI integration, and growing demand for computational access—the traditional facility model of rigid specialization between capability jobs (large, long-running tasks) and capacity jobs (numerous short, quick tasks) is increasingly misaligned with emerging needs. While such specialization has historically supported performance optimization, it has also led to infrastructure silos, underutilized resources, and operational inefficiencies. To explore the consequences of this fragmentation, we present a concrete study using production workloads representative of capability- and capacity-oriented computing. We investigate two integration strategies—workload fusion, where diverse workloads share a common platform, and workload injection, where capacity jobs opportunistically utilize idle capability resources—using trace-based, event-driven simulations to quantify impacts on utilization and efficiency. These results reveal both the limitations of siloed systems and the untapped potential of more unified operational models. Building from this analysis, we then shift focus to consider a broader question: what must future HPC facilities look like to meet the challenges ahead? Beyond architectural integration, the next generation of HPC must undergo a deeper evolution—incorporating higher-level services, rethinking policies to promote flexibility and shared use, expanding accessibility to a broader range of users, and prioritizing usability as a primary goal. In doing so, HPC can evolve from a domain of elite, narrowly tuned systems into a more agile, inclusive, and strategically aligned computational ecosystem.

Biography:

Michael E. Papka is an Argonne Senior Scientist and Distinguished Fellow. He serves as the deputy associate laboratory director for Computing, Environment, and Life Sciences (CELS) and the division director of the Argonne Leadership Computing Facility (ALCF). His leadership focuses on leveraging high-performance computing to advance scientific discovery and innovation. In addition to his roles at Argonne, Mike is the Warren S. McCulloch Professor of Computer Science at UIC. He directs the Electronic Visualization Laboratory (EVL) and is the computer science lead for the interdisciplinary CS+Design program. Mike earned a B.S. in physics from Northern Illinois University, an M.S. in computer science and electrical engineering from UIC, and an M.S. and Ph.D. in computer science from the University of Chicago.

WORKSHop overview

Scientific computing applications generate enormous datasets that are continuously increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of modern big data analytics. Supported by the rise of artificial intelligence and deep learning, such enormous datasets are becoming valuable resources even beyond their original scope, opening new opportunities to learn patterns and extract new knowledge at large scale, potentially without human intervention. However, this leads to an increasing complexity of the workflows that combine traditional HPC simulations with big data analytics and AI applications. An initial wave that opened this direction was the shift from compute-intensive to data-intensive, which saw several ideas from big data analytics (in-situ processing, shipping computations close to data, complex and dynamic workflows) fused with the tightly coupled patterns addressed by the AI and the high performance computing ecosystems. In a quest to keep up with the complexity of the workflows, the design and operation of the infrastructures capable of running them efficiently at scale has evolved accordingly. Extreme heterogeneity at all levels (combinations of CPUs and accelerators, various types of memories and local storage and network links, parallel file systems and object stores, etc.) is now the norm. ideas pioneered by cloud and edge computing (aspects related to elasticity, multi-tenancy, geo-distributed processing, stream computing) are also beginning to be adopted in the HPC ecosystem (containerized workflows, on-demand jobs to complement batch jobs, streaming of experimental data from instruments directly to supercomputers, etc.). Thus, modern scientific applications need to be integrated into an entire Compute Continuum from the edge all the way to supercomputers and large data-centers using flexible infrastructures and middlewares.

The 15th workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures (FlexScience) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running scientific computing workloads in such flexible ecosystems, across the Computing Continuum, focusing on emerging technologies and new convergence challenges that are not sufficiently addressed by the current generation of supercomputers and dedicated data centers. The workshop aims to address questions such as: what architectural changes to existing frameworks (hardware, operating systems, networking and/or programming models) are needed to support flexible computing? Dynamic information derived from remote instruments, coupled simulations, and sensor ensembles that stream data for real-time analysis and machine learning are important emerging trends. How can we leverage and adapt to these patterns? What scientific workloads are suitable candidates to take advantage of heterogeneity, elasticity and/or on-demand resources? What factors are limiting the adoption of a flexible design?

The workshop encourages interaction and cross-pollination between participants that are developing applications, algorithms, middleware and infrastructure and that are facing new challenges and opportunities to take advantage of flexible computing. The workshop will be an excellent place to help the community define the current state, determine future goals, and discuss promising technologies and techniques.

WORKSHop overview

Scientific computing applications generate enormous datasets that are continuously increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of modern big data analytics. Supported by the rise of artificial intelligence and deep learning, such enormous datasets are becoming valuable resources even beyond their original scope, opening new opportunities to learn patterns and extract new knowledge at large scale, potentially without human intervention. However, this leads to an increasing complexity of the workflows that combine traditional HPC simulations with big data analytics and AI applications. An initial wave that opened this direction was the shift from compute-intensive to data-intensive, which saw several ideas from big data analytics (in-situ processing, shipping computations close to data, complex and dynamic workflows) fused with the tightly coupled patterns addressed by the AI and the high performance computing ecosystems. In a quest to keep up with the complexity of the workflows, the design and operation of the infrastructures capable of running them efficiently at scale has evolved accordingly. Extreme heterogeneity at all levels (combinations of CPUs and accelerators, various types of memories and local storage and network links, parallel file systems and object stores, etc.) is now the norm. ideas pioneered by cloud and edge computing (aspects related to elasticity, multi-tenancy, geo-distributed processing, stream computing) are also beginning to be adopted in the HPC ecosystem (containerized workflows, on-demand jobs to complement batch jobs, streaming of experimental data from instruments directly to supercomputers, etc.). Thus, modern scientific applications need to be integrated into an entire Compute Continuum from the edge all the way to supercomputers and large data-centers using flexible infrastructures and middlewares.

The 15th workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures (FlexScience) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running scientific computing workloads in such flexible ecosystems, across the Computing Continuum, focusing on emerging technologies and new convergence challenges that are not sufficiently addressed by the current generation of supercomputers and dedicated data centers. The workshop aims to address questions such as: what architectural changes to existing frameworks (hardware, operating systems, networking and/or programming models) are needed to support flexible computing? Dynamic information derived from remote instruments, coupled simulations, and sensor ensembles that stream data for real-time analysis and machine learning are important emerging trends. How can we leverage and adapt to these patterns? What scientific workloads are suitable candidates to take advantage of heterogeneity, elasticity and/or on-demand resources? What factors are limiting the adoption of a flexible design?

The workshop encourages interaction and cross-pollination between participants that are developing applications, algorithms, middleware and infrastructure and that are facing new challenges and opportunities to take advantage of flexible computing. The workshop will be an excellent place to help the community define the current state, determine future goals, and discuss promising technologies and techniques.

topics

Complex workflows at the intersection of HPC, Big Data and AI
Experimental evaluations of porting HPC/AI applications to clouds
Hybrid clouds that combine HPC data centers with public clouds in various scenarios (bursting, data sharing)
Interplay between Edge, Fog and Hybrid Clouds
Performance portability and related abstractions to hide the heterogeneity of resources
RRR (Robustness, Reconfigurability, Reproducibility) of complex workflows
Implementation and fine-tuning of high-performance AI and deep learning frameworks for clouds (e.g., Tensorflow, PyTorch, Horovod, etc.)
Scalability and cost-effective elasticity of AI and deep learning (e.g. data-parallel training) for cloud infrastructures
Virtualization, containers, and dynamic provisioning
Scalable and elastic cloud/HPC storage and I/O data management services and architectures
Data-intensive workloads and tools (e.g. caching) in clouds
Use of popular cloud building blocks (e.g., NoSQL databases) for scientific applications
Fault tolerance and reliability in cloud systems
Analysis of management complexity, cost, variability of cloud and IoT environments

SUBMISSION

Important Dates:

Paper submission deadline: April 1, 2025 April 22, 2025 AoE

Paper notification: May 6, 2025

Camera ready papers: May 30, 2025

Workshop: July 20, 2025

Paper Categories:

Authors are invited to submit:

- Full 5-page papers

Formatting:

Authors are invited to submit papers describing unpublished, original research. All submitted manuscripts should be formatted using the ACM Master Template with sigconf format (please be sure to use the current version). All necessary documentation can be found at: https://www.acm.org/publications/proceedings-template. Workshop papers should have a maximum of 5 pages. All papers must be in English. We use single-blind reviewing process, so please keep the authors names, publications, etc., in the text.

Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM Digital Library.

Paper submission:

Papers conforming to these guidelines should be submitted through HotCRP.

CHairs

Alexandru Costan, IRISA / INSA Rennes, France (alexandru.costan@irisa.fr)

Bogdan Nicolae, Argonne National Laboratory, USA (bogdan.nicolae@acm.org)

Kento Sato, RIKEN Center, Japan (kento.sato@riken.jp)

programme committee

Michael Sevilla, University of Santa Cruz, USA

Dongfang Zhao, University of Nevada, USA

Elena Apostol, Universitatea Politehnica Bucharest, Romania

Kevin Brown, Argonne National Laboratory, USA

Anthony Kougkas, Illinois Institute of Technology, USA

Ryan Chard, Argonne National Laboratory, USA

Teng Wang, Florida State University, USA

Takakki Fukai, RIKEN, Japan

Radu Prodan, University of Klagenfurt, Austria

Mustafa Rafique, Rochester Institute of Technology, USA

Michael Schoettner, University of Duesseldorf, Germany