Email: whucspyh AT gmail.com
I am a research scientist and engineering lead at Seed Infra, ByteDance, where I build large-scale AI infrastructure for training foundation and multimodal models.
My current focus is on optimizing exascale training performance for LLMs, multimodal LLMs, and image/video generation models running across superclusters with tens of thousands of accelerators.
Over the past nine years, I’ve worked at the intersection of machine learning systems, distributed training, and GPU performance engineering, bridging cutting-edge research and real-world large-scale deployment.
Large-Scale LLM Training: parallelism design, computation and communication optimization, and throughput scaling on super scale GPU clusters.
System Reliability & Resilience: fast checkpointing, fault tolerance, and failure diagnosis for long-running jobs.
Network & Scheduling Optimization: GPU memory management, dynamic scheduling, and cross-stack performance tuning.
AI Infrastructure for Generative Models: supporting foundation, multimodal, and image/video generation models at production scale.
[EuroSys'26] MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
[EuroSys'25] HybridFlow/veRL: A Flexible and Efficient RLHF Framework
[ATC'25] Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
[NSDI'25] ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development
[NSDI'23] BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
[SIGCOMM'22] Multi-Resource Interleaving for Deep Learning Training
[Communications of the CCF'21] 分布式深度学习训练的通信加速
[SoCC'20] Elastic Parameter Server Load Distribution in Deep Learning Clusters
[SOSP'19] A Generic Communication Scheduler for Distributed DNN Training Acceleration
[EuroSys'18] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters
See more in my Google Scholar profile.
I currently lead a team of 30+ engineers and researchers focused on building next-generation LLM training infrastructure at ByteDance.
We collaborate closely with research and production teams to ensure scalability, reliability, and efficiency at exascale.
We’re actively hiring exceptional engineers and researchers passionate about large-model systems, GPU optimization, and distributed AI infrastructure.
If you’re interested, please reach out via email or LinkedIn to learn more.
Ph.D. in Computer Science, The University of Hong Kong (advised by Prof. Chuan Wu) 2020
B.Eng. in Computer Science, Wuhan University 2015
AliStar, Huawei Top Minds, SenseTime AI Pioneer 2019
Lee Shau Kee Postgraduate Fellowship 2016
OutStanding Undergraduate Scholarship 2015
Google Excellence Scholarship 2014
National First Prize in Mathematical Contest in Modeling 2013
First Prize of New Enrollment Scholarship 2011