Email: whucspyh AT gmail.com
I am a research scientist and engineering lead at Seed Infra, ByteDance, leading the development of large-scale AI infrastructure for training foundation and multimodal models, including the Seed, Seedream, and Seedance series.
My current focus is on optimizing exascale training performance for LLMs, multimodal LLMs, and image/video generation models running across superclusters with tens of thousands of accelerators.
Over the past nine years, I’ve worked at the intersection of machine learning systems, distributed training, and GPU performance engineering, bridging cutting-edge research and real-world large-scale deployment.
Large-Scale Multimodal & Video Generation Training: parallelism design, computation and communication optimization, and throughput scaling on super scale GPU clusters.
System Reliability & Resilience: fast checkpointing, fault tolerance, and systematic failure diagnosis for long-running jobs.
Network & Scheduling Optimization: GPU memory efficiency, dynamic scheduling, and cross-stack performance tuning.
AI Infrastructure for Generative Models: supporting foundation, multimodal, and image/video generation models at production scale.
[MLSys'26] veScale-FSDP: Flexible and High Performance FSDP at Scale
[EuroSys'26] MegaScale-Omni: Large-Scale Workload-Resilient Training of MultiModal LLM in Production
[EuroSys'26] MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
[EuroSys'25] HybridFlow/veRL: A Flexible and Efficient RLHF Framework
[ATC'25] Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
[NSDI'25] ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development
[NSDI'23] BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
[SIGCOMM'22] Multi-Resource Interleaving for Deep Learning Training
[Communications of the CCF'21] 分布式深度学习训练的通信加速
[SoCC'20] Elastic Parameter Server Load Distribution in Deep Learning Clusters
[SOSP'19] A Generic Communication Scheduler for Distributed DNN Training Acceleration
[EuroSys'18] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters
See more in my Google Scholar profile.
I currently lead a team of 40+ engineers and researchers developing cutting-edge training infrastructure for multimodal LLMs and video generation at ByteDance.
We collaborate closely with research and production teams to ensure scalability, reliability, and efficiency at exascale.
We’re actively hiring exceptional engineers and researchers passionate about large-model systems, GPU optimization, and distributed AI infrastructure.
If you’re interested, please reach out via email or LinkedIn to learn more.
Ph.D. in Computer Science, The University of Hong Kong (advised by Prof. Chuan Wu) 2020
B.Eng. in Computer Science, Wuhan University 2015
AliStar, Huawei Top Minds, SenseTime AI Pioneer 2019
Lee Shau Kee Postgraduate Fellowship 2016
OutStanding Undergraduate Scholarship 2015
Google Excellence Scholarship 2014
National First Prize in Mathematical Contest in Modeling 2013
First Prize of New Enrollment Scholarship 2011