June 3, 2024

AI4Sys '24

At HPDC 2024

University of Pisa 

Pisa, Italy





9:00 - 9:05: Welcome/Opening Remarks

9:05 - 10:05: Keynote: The Hitchhiker's Guide to Using Machine Learning in System-level Resource Management. Thaleia Doudali (IMDEA Software Institute)

10:05 - 10:25: MPIrigen: MPI Code Generation through Domain-Specific Language Models

10:25 - 11:00: Morning Coffee Break

11:00 - 11:20: ECO-LLM: LLM-based Edge Cloud Optimization

11:20 - 11:40: StreamingRAG: Real-time Contextual Retrieval and Generation Framework

11:40 - 12:00 Toward Using Representation Learning for Cloud Resource Usage Forecasting


Keynote: The Hitchhiker's Guide to Using Machine Learning in System-level Resource Management.

Abstract: This talk will take you through a journey of best practices, things to avoid and unconventional approaches for integrating machine learning methods in computer system-level resource management of cloud and high performance computing environments. These environments suffer from low resource utilization, due to the significant difference between resources allocated to the users and those actually used in practice. While the use of machine learning can lead to improved resource management and efficiency, its production-level use comes with significant overheads, engineering effort and interpretability concerns. This talk will inspire you to think outside-the-box and lead you to an existential question of whether machine learning is even necessary to use in certain aspects of system-level resource management.

Speaker Bio: Thaleia Dimitra Doudali is an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain. She received her PhD from the Georgia Institute of Technology (Georgia Tech) in the United States, advised by Ada Gavrilovska. Prior to that she earned anundergraduate diploma in Electrical and Computer Engineering at the National Technical University of Athens in Greece. Thaleia’s research lies at the intersection of Systems and Machine Learning, where she explores novel methodologies, such as machine learning and computer vision, to improve system-level resource management of emerging hardware technologies. In 2021, Thaleia received the Juan de la Cierva post-doctoral fellowship. In 2020, Thaleia was selected to attend the prestigious Rising Stars in EECS academic workshop. Aside from research, Thaleia actively strives to improve the mental health awareness in academia and foster diversity and inclusion.


ABOUT The 2nd AI4Sys


AI/ML are being incorporated into all aspects of the scientific and engineering process. One early effort area has been to augment existing autonomic system components with AI models that can offer finer-grained and continuously updated automation behavior. Work has been done to try to predict IO behavior to enable more efficient machine throughput, log monitoring to detect patterns that may reveal either security concerns or faulty components that fail in consistent, but unusual ways, and to manage applications and caches to better address the system as a whole rather than at an individual component level.  All of these, and many more, system-related tasks address a complex, sometimes intractable problem, and seek to use AI tools to offer better solutions than either heuristics or point solutions that have existed previously. 

This workshop solicits novel work that explores how to effectively incorporate AI into system management and monitoring, particularly for complex systems that support scientific and engineering workloads (i.e., cloud and HPC). Areas of interest and domains of work include, but are not limited to:

1) tools and runtimes for incorporating AI into systems

2) privacy and security concerns for managing system data used for model creation

3) continuous model evolution and the impacts of chasing current workloads on a dynamic system

4) AI algorithms for systems problems

5) Subsystem related optimizations including operating systems, data migration, storage, job management, resource allocation, and related topics

6) Position and experience papers on using AI in systems

Papers will be in the ACM conference format no more than 5 pages long including everything except references. Submissions are to be single anonymized (include author information in the paper, but the reviewers will remain anonymized)

Submission Link: https://ai4sys24.hotcrp.com/




• Submission Deadline (firm): March 30, 2024 AoE

• Responses to Authors: April 13, 2024 

• Camera Ready due: TBD (in line with HPDC Camera Ready deadline)

• Workshop: TBD (HPDC is June 3-7)




General Chairs:

• Jay Lofstead (Sandia National Laboratories)

• Jai Dayal (Samsung Advanced Institute of Technology)

Program Committee: TBD