About the Tutorial
The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks. It lacks standard tools and methodologies to evaluate and profile models or systems. Due to the absence of standard tools, the state of the practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error-prone — stifling the adoption of the innovations in a rapidly moving field.
The goal of the tutorial is to bring experts from the industry and academia together to shed light on the following topics to foster systematic development, reproducible evaluation, and performance analysis of deep learning artifacts. It seeks to address the following questions:
What are the benchmarks that can effectively capture the scope of the ML/DL domain?
Are the existing frameworks sufficient for this purpose?
What are some of the industry-standard evaluation platforms or harnesses?
What are the metrics for carrying out an effective comparative evaluation?
2026.hpca-conf.org/attending/registration
8:45 AM - 9:00 AM: Introduction
9:00 AM - 9:30 AM: Growing Markets with Technical Standards for AI Reliability - Peter Mattson (Google) <slides>
9:30 AM - 10:00 AM: Data Preprocessing Challenges and Opportunities in ML Pipelines - Oana Balmau (McGill) <slides>
10:00 AM - 10:30 AM: MoE-Inference-Bench: Performance Evaluation of Mixture of Experts Large Language and Vision Models - Murali Emani (Argonne National Laboratory) <slides>
10:30 AM - 11:00 AM: Break
11:00 AM - 11:30 AM: SWE-fficiency - Can Language Models Optimize Real-World Repositories on Real Workloads? - Jeffrey Ma (Harvard) <slides>
11:30 AM - 12:00 PM: Portable Compilation for Future AI - Fredrik Kjolstad (Stanford) <slides>
12:00 PM - 12:30 PM: AI's Memory Challenges - Jae W. Lee (Seoul National University) <slides>
12:30 PM - 12:45 PM: Conclusion
Tom St. John (Gimlet Labs)
Carole-Jean Wu (Meta)
Vijay Janapa Reddi (Harvard)