Performance Optimization In the LLM World

3rd AIPerf and Optimization in the LLM World

ICPE 2025 Workshop

OVERVIEW

Artificial Intelligence (AI) has been widely adopted to investigate various mainstream domains (e.g., computer vision, natural language processing, and even reliability analysis). However, its use for performance modeling and evaluation remains limited, and its benefits to the performance engineering field are still unclear. AI tools are often employed as black-box models and are not specifically designed for performance evaluation or control. This leads to the creation of models that require substantial time and data to develop and are not easily understood by domain experts. Researchers and practitioners have recently started focusing on methods such as explainable/white-box AI-based solutions in performance engineering to address these challenges. Unfortunately, tools, methodologies, and datasets that enable wider adoption are still lacking.

Moreover, the rapid rise in popularity and adoption of large language models (LLMs) like ChatGPT has further complicated the problem of performance optimization modeling and control. LLM pre-training is expensive; ChatGPT is estimated to cost over $700,000 per day, and using GPT-4 for customer service can cost a small business over $21,000 a month. The high infrastructure, financial costs, and the need for specialized talent make LLM technology inaccessible to most organizations. Additionally, the up-front costs include the emissions generated to manufacture the necessary hardware and the cost to run that hardware during the training process, both when the machines are operating at full capacity and when they are idle. The best estimate of the dynamic computing cost for GPT-3, the model behind the original ChatGPT, is approximately 1,287,000 kWh, or 552 tons of carbon dioxide.

The goal of this workshop is to bridge this gap by promoting the dissemination of research that utilizes or studies AI techniques for the quantitative analysis of modern ICT systems, such as LLM applications, to optimize performance while reducing energy consumption and cost. To address this urgent need, the workshop brings together researchers from academia and industry to share their experiences and insights in performance engineering within the LLM domain and AI-based applications in general.

GOALS

The workshop will be composed of invited talks, work in progress and fully refereed papers and a panel.

Presentations are not limited to the following topics

1. Optimizing LLM Workloads on Traditional and New Architectures

2. Hardware-Assisted LLM Systems

3. LLM Optimization at Scale

4. Code generation optimization for modern hardware

5. Data-driven model identification for performance evaluation of ICT systems

6. White-box performance modeling

7. Datasets and benchmarks for training and validating AI performance models

8. Explainability and robustness assessment of AI systems in performance engineering

9. AI models for performance anomaly detection, classification, root cause analysis and remediation

10. AI models for performance tasks automation, including auto-scaling and self-optimization

Panel Discussion (speakers from the industry and academia)

The target audience:

Researchers that are advocating new ways of optimizing LLM applications in software or hardware optimizations
Practitioners that need to solve runtime performance problems in their LLM deployments
Researchers and Practitioners interested in performance optimization, modeling and control of modern ICT applications

CALL FOR PAPERS

Submission Guidelines:

A variety of contribution styles for papers are solicited including:

Regular research papers (max 8 pages). Novel contributions that are fully validated and well positioned in the state-of-the-art
Empirical, experience, reproduction or case study papers (max 6 pages). Work-in-progress; Vision papers; New ideas that still need to be validated; Industrial case studies; Experiences (positive or negative) of using AI for performance engineering

Please submit the paper through HotCRP.

IMPORTANT DATES

WORKSHOP SCHEDULE

The AIPerfLLM @ ICPE 2025 workshop will be a half-day event on May 6@Room 2, featuring invited talks, research presentations, and an expert panel discussion. The workshop aims to foster discussions on AI-driven performance optimization, large language models (LLMs), and workload-aware system tuning.

This year, WOSP-C will contribute to AIPerfLLM @ ICPE 2025 with an insightful panel discussion featuring experts in performance optimization and AI-driven system design.

PROGRAM COMMITTEE

Kingsum Chow

Professor, Zhejiang University

k i n g s u m . c h o w [at] g m a i l . c o m

Emilio Incerto

Assistant Professor, IMT School for Advanced Studies Lucca

e m i l i o . i n c e r t o [at] i m t l u c c a . i t

Marin Litoiu

Professor, York University

m l i t o i u [at] y o r k u . c a

Zhihao Chang

Assistant Professor, Zhejiang University

c h a n g z h i h a o [at] z j u . e d u . c n

Anil Rajput

AMD Fellow

A n i l _ R a j p u t [at] y a h o o . c o m

Khun Ban

Cloud Performance Architect, Intel

k h u n b a n [at] g m a i l . c o m

Daniele Masti

PostDoc, Gran Sasso Science Institute

d a n i e l e . m a s t i [at] g s s i . i t

Zhiheng Lyu

University of Waterloo

z 6 3 l y u [at] u w a t e r l o o . c a

Page updated

Google Sites

Report abuse