ICPE 2026 Workshop
The AIPerfLLM @ ICPE 2026 workshop will be a half-day event held on the afternoon of May 5, 2026, at a venue to be announced. The workshop will feature invited talks, research presentations, and an expert panel discussion, aiming to foster discussions on AI-driven performance optimization, large language models (LLMs), and workload-aware system tuning.
"Beyond the Dashboard: How AI is Turning Performance Engineers into System Architects"
Abstract: As cloud architectures transition from microservices to hyper-scale distributed systems, the "human-in-the-loop" model for performance management has reached its breaking point. We are drowning in dashboards but starving for actionable insights. This keynote charts a roadmap for the future of the discipline, tracing the journey from manual performance testing to the deployment of autonomous, multi-agentic ecosystems.
We will explore this evolution through three phases:
The Tactical Foundation: Using AI to automate the "drudgery" of performance work — generating complex telemetry parsers, synthetic test suites, and load-test scripts in seconds rather than days.
The Diagnostic Leap: Moving beyond simple alerts to AI-driven root-cause analysis, where models navigate high-cardinality data to diagnose performance regressions and "silent" bottlenecks.
The Multi-Agentic Future: Architecting a "mesh" of specialized AI agents that collaborate to monitor, analyze, and optimize cloud infrastructure in real-time, shifting from reactive troubleshooting to predictive, self-healing systems.
Central to this transformation is the evolution of the Performance Engineer. We will discuss how our roles are shifting from manual "firefighters" to System Architects — designing the high-level objectives, guardrails, and intent that govern intelligent agents.
Bio: Andrea Pellegrini is a Principal Engineer at Microsoft, where he leads performance efforts for Azure's General Purpose VM products. Previously, he was a Distinguished Engineer at Arm, serving as the technical lead for performance and workloads for Arm's server IPs. He holds a PhD in Computer Engineering from the University of Michigan and M.E./B.E. degrees in Computer Engineering from the Università di Bologna, Italy.
"MCE: A Three-Dimensional Efficiency Evaluation Framework for LLM Inference Systems"
Abstract: MCE (Model Computational Efficiency) is a performance evaluation framework designed for Large Language Model (LLM) inference systems. It establishes a three-dimensional efficiency evaluation architecture consisting of "Model, Framework, and Bare Metal". By defining representative metrics for each dimension, it normalizes raw performance data into dimensionless efficiency scores, and ultimately calculates a comprehensive MCE score through weighted integration.
Bio: Arthur (Zhenjian) Kang is the Chair of the SPEC OSG Machine Learning Committee and Manager of Standards & Certification in the Solution Build & Empower Department at IEIT Systems Co., Ltd. His work focuses on establishing industry-standard benchmarks and evaluation methodologies for machine learning workloads, with a particular emphasis on LLM inference performance characterization and efficiency measurement.
Artificial Intelligence (AI) has been widely adopted across domains such as computer vision, natural language processing, and reliability analysis. However, its application to performance modeling and evaluation remains limited. While 78% of organizations now use AI in at least one business function, only 16% have fully implemented AI in their performance testing processes. AI tools are often deployed as black-box models — such as DeepPerf for configuration tuning or Datadog Watchdog for anomaly detection — that achieve high accuracy but provide little insight into why a system behaves as it does. Performance debugging fundamentally requires causal reasoning, not just correlation: engineers need to know which code path is affected and whether the root cause is cache contention, thread starvation, or a memory allocation pattern. Researchers have begun exploring explainable and white-box approaches — including Comprex for configuration-dependent performance modeling, Opal for belief-traced LLM reasoning over profiling data, and GPTuner for knowledge-grounded database tuning — but standardized tools, benchmarks, and datasets for broader adoption are still lacking.
The rapid rise of large language models (LLMs) has made performance optimization both more urgent and more challenging. Training a single frontier model now costs $100M–$500M in compute alone, and next-generation runs are projected to exceed $1B. Global data center electricity consumption reached ~415 TWh in 2024 — roughly 1.5% of global electricity — and is projected to more than double to ~945 TWh by 2030. The four largest cloud providers alone plan to spend over $630B on AI infrastructure in 2026. Meanwhile, inference costs are falling dramatically — equivalent model performance costs roughly 10–50× less each year — creating a paradox where the industry spends more to train frontier models while end-user costs decline. Reasoning-enabled models further complicate the picture, consuming up to 50× more energy per query than standard models due to extended chain-of-thought generation.
This workshop bridges the gap between AI capabilities and performance engineering needs by promoting research that applies AI techniques to the quantitative evaluation and optimization of modern ICT systems, including LLM workloads themselves. Key open problems include end-to-end causal performance models that connect profiling diagnostics to actionable code changes, cross-layer reasoning spanning application code through hardware, validated AI-generated optimization recommendations, and transfer learning across workloads and environments. The workshop brings together researchers from academia and industry to share experiences and advances in AI-driven performance engineering for the LLM era.
The workshop will be composed of invited talks, work in progress and fully refereed papers and a panel.
Presentations are not limited to the following topics
1. Optimizing LLM Workloads on Traditional and New Architectures
2. Hardware-Assisted LLM Systems
3. LLM Optimization at Scale
4. Code generation optimization for modern hardware
5. Data-driven model identification for performance evaluation of ICT systems
6. White-box performance modeling
7. Datasets and benchmarks for training and validating AI performance models
8. Explainability and robustness assessment of AI systems in performance engineering
9. AI models for performance anomaly detection, classification, root cause analysis and remediation
10. AI models for performance tasks automation, including auto-scaling and self-optimization
Panel Discussion (speakers from the industry and academia)
The target audience:
Researchers that are advocating new ways of optimizing LLM applications in software or hardware optimizations
Practitioners that need to solve runtime performance problems in their LLM deployments
Researchers and Practitioners interested in performance optimization, modeling and control of modern ICT applications
A variety of contribution styles for papers are solicited including:
Regular research papers (up to 10 pages, including references). Fully validated contributions, strong methodological foundations, and clear positioning in the state of the art.
Empirical, experience, reproduction, or case study papers (up to 10 pages, including references). Work-in-progress results; vision or position papers; early-stage ideas; industrial case studies; positive or negative experiences applying AI or analytical methods to performance engineering.
Please submit the paper through HotCRP.
Kingsum Chow
Professor, Zhejiang University
k i n g s u m . c h o w [at] g m a i l . c o m
Emilio Incerto
Assistant Professor, IMT School for Advanced Studies Lucca
e m i l i o . i n c e r t o [at] i m t l u c c a . i t
Marin Litoiu
Professor, York University
m l i t o i u [at] y o r k u . c a
Zhihao Chang
Assistant Professor, Zhejiang University
c h a n g z h i h a o [at] z j u . e d u . c n
Anil Rajput
AMD Fellow
A n i l _ R a j p u t [at] y a h o o . c o m
Khun Ban
Cloud Performance Architect, Intel
k h u n b a n [at] g m a i l . c o m
Daniele Masti
PostDoc, Gran Sasso Science Institute
d a n i e l e . m a s t i [at] g s s i . i t
Zhiheng Lyu
University of Waterloo
z 6 3 l y u [at] u w a t e r l o o . c a
Roberto Pizziol
IMT School for Advanced Studies
r o b e r t o . p i z z i o l [at] i m t l u c c a . i t
Marco Zamponi
IMT School for Advanced Studies
m a r c o . z a m p o n i [at] i m t l u c c a . i t