15th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures
FLEXSCIENCE 2025
Notre Dame, IN, USA, July 20, 2025
Notre Dame, IN, USA, July 20, 2025
This year's edition of FlexScience is organized jointly with the FRAME workshop.
Venue: Morris Inn, room Hesburgh
Date: July 20th, 13h30 - 17h00
Papers have 20 min slots (15 min presentation + 5 min Q&A).
Preliminary program:
13h30 Opening - chair Alexandru Costan
13h40 Keynote - chair Bogdan Nicolae
Michael E. Papka, Title TBA
14h30 Session 1 - FRAME - chair Alexandru Costan
Rusty-Cracker: A Multi-core Connected Components Library in Rust, Davide Rucci, Daniele Sanpietro, Emanuele Carlini, Matteo Mordacchini, Patrizio Dazzi
15h00 Coffee Break
15h30 Session 2 - FlexScience - chair Kento Sato
Towards a Federated Approach to Complex Digital Twins, H. Ahmed, D. Crawl, I. Altintas
RAPTOR: Reconfigurable Advanced Platform for Transdisciplinary Open Research, H. Najafi, P. Poudel, K. Bahreini, J. Ibarra, F. Saeed, Y. Li, J. Obeysekera, J. Liu
Building Flexible Physics-Informed Neural Networks with Fast Fourier Transform Analysis, R. Shehayib, J. Vap, P. Kogge
Efficient and Cost-Effective HPC on the Cloud, A. Bhosale, L. Kale, S. Kokkila-Schumacher
16h50 Closing Remarks
Title: Rethinking Specialization: A Path Toward Agile and Unified HPC Facilities
Abstract:
As high-performance computing (HPC) enters a period of rapid transformation—driven by data-centric science, AI integration, and growing demand for computational access—the traditional facility model of rigid specialization between capability jobs (large, long-running tasks) and capacity jobs (numerous short, quick tasks) is increasingly misaligned with emerging needs. While such specialization has historically supported performance optimization, it has also led to infrastructure silos, underutilized resources, and operational inefficiencies. To explore the consequences of this fragmentation, we present a concrete study using production workloads representative of capability- and capacity-oriented computing. We investigate two integration strategies—workload fusion, where diverse workloads share a common platform, and workload injection, where capacity jobs opportunistically utilize idle capability resources—using trace-based, event-driven simulations to quantify impacts on utilization and efficiency. These results reveal both the limitations of siloed systems and the untapped potential of more unified operational models. Building from this analysis, we then shift focus to consider a broader question: what must future HPC facilities look like to meet the challenges ahead? Beyond architectural integration, the next generation of HPC must undergo a deeper evolution—incorporating higher-level services, rethinking policies to promote flexibility and shared use, expanding accessibility to a broader range of users, and prioritizing usability as a primary goal. In doing so, HPC can evolve from a domain of elite, narrowly tuned systems into a more agile, inclusive, and strategically aligned computational ecosystem.
Biography:
Michael E. Papka is an Argonne Senior Scientist and Distinguished Fellow. He serves as the deputy associate laboratory director for Computing, Environment, and Life Sciences (CELS) and the division director of the Argonne Leadership Computing Facility (ALCF). His leadership focuses on leveraging high-performance computing to advance scientific discovery and innovation. In addition to his roles at Argonne, Mike is the Warren S. McCulloch Professor of Computer Science at UIC. He directs the Electronic Visualization Laboratory (EVL) and is the computer science lead for the interdisciplinary CS+Design program. Mike earned a B.S. in physics from Northern Illinois University, an M.S. in computer science and electrical engineering from UIC, and an M.S. and Ph.D. in computer science from the University of Chicago.
Scientific computing applications generate enormous datasets that are continuously increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of modern big data analytics. Supported by the rise of artificial intelligence and deep learning, such enormous datasets are becoming valuable resources even beyond their original scope, opening new opportunities to learn patterns and extract new knowledge at large scale, potentially without human intervention. However, this leads to an increasing complexity of the workflows that combine traditional HPC simulations with big data analytics and AI applications. An initial wave that opened this direction was the shift from compute-intensive to data-intensive, which saw several ideas from big data analytics (in-situ processing, shipping computations close to data, complex and dynamic workflows) fused with the tightly coupled patterns addressed by the AI and the high performance computing ecosystems. In a quest to keep up with the complexity of the workflows, the design and operation of the infrastructures capable of running them efficiently at scale has evolved accordingly. Extreme heterogeneity at all levels (combinations of CPUs and accelerators, various types of memories and local storage and network links, parallel file systems and object stores, etc.) is now the norm. ideas pioneered by cloud and edge computing (aspects related to elasticity, multi-tenancy, geo-distributed processing, stream computing) are also beginning to be adopted in the HPC ecosystem (containerized workflows, on-demand jobs to complement batch jobs, streaming of experimental data from instruments directly to supercomputers, etc.). Thus, modern scientific applications need to be integrated into an entire Compute Continuum from the edge all the way to supercomputers and large data-centers using flexible infrastructures and middlewares.
The 15th workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures (FlexScience) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running scientific computing workloads in such flexible ecosystems, across the Computing Continuum, focusing on emerging technologies and new convergence challenges that are not sufficiently addressed by the current generation of supercomputers and dedicated data centers. The workshop aims to address questions such as: what architectural changes to existing frameworks (hardware, operating systems, networking and/or programming models) are needed to support flexible computing? Dynamic information derived from remote instruments, coupled simulations, and sensor ensembles that stream data for real-time analysis and machine learning are important emerging trends. How can we leverage and adapt to these patterns? What scientific workloads are suitable candidates to take advantage of heterogeneity, elasticity and/or on-demand resources? What factors are limiting the adoption of a flexible design?
The workshop encourages interaction and cross-pollination between participants that are developing applications, algorithms, middleware and infrastructure and that are facing new challenges and opportunities to take advantage of flexible computing. The workshop will be an excellent place to help the community define the current state, determine future goals, and discuss promising technologies and techniques.
Scientific computing applications generate enormous datasets that are continuously increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of modern big data analytics. Supported by the rise of artificial intelligence and deep learning, such enormous datasets are becoming valuable resources even beyond their original scope, opening new opportunities to learn patterns and extract new knowledge at large scale, potentially without human intervention. However, this leads to an increasing complexity of the workflows that combine traditional HPC simulations with big data analytics and AI applications. An initial wave that opened this direction was the shift from compute-intensive to data-intensive, which saw several ideas from big data analytics (in-situ processing, shipping computations close to data, complex and dynamic workflows) fused with the tightly coupled patterns addressed by the AI and the high performance computing ecosystems. In a quest to keep up with the complexity of the workflows, the design and operation of the infrastructures capable of running them efficiently at scale has evolved accordingly. Extreme heterogeneity at all levels (combinations of CPUs and accelerators, various types of memories and local storage and network links, parallel file systems and object stores, etc.) is now the norm. ideas pioneered by cloud and edge computing (aspects related to elasticity, multi-tenancy, geo-distributed processing, stream computing) are also beginning to be adopted in the HPC ecosystem (containerized workflows, on-demand jobs to complement batch jobs, streaming of experimental data from instruments directly to supercomputers, etc.). Thus, modern scientific applications need to be integrated into an entire Compute Continuum from the edge all the way to supercomputers and large data-centers using flexible infrastructures and middlewares.
The 15th workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures (FlexScience) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running scientific computing workloads in such flexible ecosystems, across the Computing Continuum, focusing on emerging technologies and new convergence challenges that are not sufficiently addressed by the current generation of supercomputers and dedicated data centers. The workshop aims to address questions such as: what architectural changes to existing frameworks (hardware, operating systems, networking and/or programming models) are needed to support flexible computing? Dynamic information derived from remote instruments, coupled simulations, and sensor ensembles that stream data for real-time analysis and machine learning are important emerging trends. How can we leverage and adapt to these patterns? What scientific workloads are suitable candidates to take advantage of heterogeneity, elasticity and/or on-demand resources? What factors are limiting the adoption of a flexible design?
The workshop encourages interaction and cross-pollination between participants that are developing applications, algorithms, middleware and infrastructure and that are facing new challenges and opportunities to take advantage of flexible computing. The workshop will be an excellent place to help the community define the current state, determine future goals, and discuss promising technologies and techniques.
Complex workflows at the intersection of HPC, Big Data and AI
Experimental evaluations of porting HPC/AI applications to clouds
Hybrid clouds that combine HPC data centers with public clouds in various scenarios (bursting, data sharing)
Interplay between Edge, Fog and Hybrid Clouds
Performance portability and related abstractions to hide the heterogeneity of resources
RRR (Robustness, Reconfigurability, Reproducibility) of complex workflows
Implementation and fine-tuning of high-performance AI and deep learning frameworks for clouds (e.g., Tensorflow, PyTorch, Horovod, etc.)
Scalability and cost-effective elasticity of AI and deep learning (e.g. data-parallel training) for cloud infrastructures
Virtualization, containers, and dynamic provisioning
Scalable and elastic cloud/HPC storage and I/O data management services and architectures
Data-intensive workloads and tools (e.g. caching) in clouds
Use of popular cloud building blocks (e.g., NoSQL databases) for scientific applications
Fault tolerance and reliability in cloud systems
Analysis of management complexity, cost, variability of cloud and IoT environments
Paper submission deadline: April 1, 2025 April 22, 2025 AoE
Paper notification: May 6, 2025
Camera ready papers: May 30, 2025
Workshop: July 20, 2025
Authors are invited to submit:
Full 5-page papers
Authors are invited to submit papers describing unpublished, original research. All submitted manuscripts should be formatted using the ACM Master Template with sigconf format (please be sure to use the current version). All necessary documentation can be found at: https://www.acm.org/publications/proceedings-template. Workshop papers should have a maximum of 5 pages. All papers must be in English. We use single-blind reviewing process, so please keep the authors names, publications, etc., in the text.
Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM Digital Library.
Papers conforming to these guidelines should be submitted through HotCRP.
Alexandru Costan, IRISA / INSA Rennes, France (alexandru.costan@irisa.fr)
Bogdan Nicolae, Argonne National Laboratory, USA (bogdan.nicolae@acm.org)
Kento Sato, RIKEN Center, Japan (kento.sato@riken.jp)
Michael Sevilla, University of Santa Cruz, USA
Dongfang Zhao, University of Nevada, USA
Elena Apostol, Universitatea Politehnica Bucharest, Romania
Kevin Brown, Argonne National Laboratory, USA
Anthony Kougkas, Illinois Institute of Technology, USA
Ryan Chard, Argonne National Laboratory, USA
Teng Wang, Florida State University, USA
Takakki Fukai, RIKEN, Japan
Radu Prodan, University of Klagenfurt, Austria
Mustafa Rafique, Rochester Institute of Technology, USA
Michael Schoettner, University of Duesseldorf, Germany