Cluster Director: Unleashing Performance with Managed Supercomputing Infrastructure
May 12, 2026
9:00 AM - 10:00 AM Pacific Time
Online
May 12, 2026
9:00 AM - 10:00 AM Pacific Time
Online
About the Session
The complexity of orchestrating large-scale high-performance computing (HPC) simulations and AI training often creates a "management tax" that slows down innovation. To stay competitive in 2026, researchers and developers need to spend their time on code and models, not debugging infrastructure.
Join the Google Cloud Advanced Computing Community for a deep dive into Cluster Director, Google’s managed infrastructure service designed to simplify the entire lifecycle of HPC and AI clusters. We will explore how Cluster Director replaces manual, error-prone setup with automated, topology-aware orchestration for both Slurm and Kubernetes environments.
In this session, we will demonstrate how Cluster Director accelerates your productivity by:
Automating the "Day 0" to "Day 2" Lifecycle: From rapid deployment using validated reference architectures to automated health checks and one-click node remediation.
Optimizing for Performance: Learn how topology-aware placement ensures your GPUs and TPUs are physically co-located to minimize latency and maximize scaling efficiency.
Flexible Consumption Models: Discover how to seamlessly integrate the latest consumption options—including Reservations for guaranteed capacity, Dynamic Workload Scheduler (Flex-start) for cost-optimized scaling, and Spot VMs—into a single, unified environment.
Enhanced Observability: A tour of the integrated dashboard that provides real-time visibility into cluster health and performance, helping you identify and resolve stragglers before they impact your job.
Whether you are scaling a trillion-parameter LLM or running petascale scientific simulations, this session will show you how Cluster Director provides a simple, efficient path to the most powerful computing technologies on Google Cloud.
Speaker
Ilias Katsardis
Senior Product Manager - AI Infrastructure, Google Cloud
Ilias Katsardis is a Senior Product Manager based in Sunnyvale, CA, driving the future of AI infrastructure at Google Cloud. He is responsible for Cluster Director and the Cluster Toolkit, two key components of Google's supercomputing architecture. Passionate about making large-scale AI and HPC more accessible, Ilias focuses on creating solutions that automate complex configurations and provide a seamless user experience. His work enables researchers and developers to spend less time on infrastructure management and more time on scientific breakthroughs. With a rich background that includes roles at Cray Inc. and ClusterVision, along with founding two tech startups, Ilias brings over 15 years of deep industry expertise to his role.