Yanghua Peng

Email: whucspyh AT gmail.com

I am a research scientist and engineering lead at Seed Infra, ByteDance, where I build large-scale AI infrastructure for training foundation and multimodal models.

My current focus is on optimizing exascale training performance for LLMs, multimodal LLMs, and image/video generation models running across superclusters with tens of thousands of accelerators.

Over the past nine years, I’ve worked at the intersection of machine learning systems, distributed training, and GPU performance engineering, bridging cutting-edge research and real-world large-scale deployment.

Research & Engineering Focus

Large-Scale LLM Training: parallelism design, computation and communication optimization, and throughput scaling on super scale GPU clusters.
System Reliability & Resilience: fast checkpointing, fault tolerance, and failure diagnosis for long-running jobs.
Network & Scheduling Optimization: GPU memory management, dynamic scheduling, and cross-stack performance tuning.
AI Infrastructure for Generative Models: supporting foundation, multimodal, and image/video generation models at production scale.

Selected Publications & Systems

Tech Reports

LLM Systems

[NSDI'24] MegaScale: Scaling Large Language Modeling Training to More Than 10,000 GPUs

[EuroSys'24] CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Machine Learning Systems

[NSDI'23] BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
[SIGCOMM'22] Multi-Resource Interleaving for Deep Learning Training
[Communications of the CCF'21] 分布式深度学习训练的通信加速
[SoCC'20] Elastic Parameter Server Load Distribution in Deep Learning Clusters
[SOSP'19] A Generic Communication Scheduler for Distributed DNN Training Acceleration
[EuroSys'18] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters

See more in my Google Scholar profile.

Team & Hiring

I currently lead a team of 30+ engineers and researchers focused on building next-generation multimodal LLM training infrastructure at ByteDance.

We collaborate closely with research and production teams to ensure scalability, reliability, and efficiency at exascale.

We’re actively hiring exceptional engineers and researchers passionate about large-model systems, GPU optimization, and distributed AI infrastructure.

If you’re interested, please reach out via email or LinkedIn to learn more.

Previous Experience

Education

Ph.D. in Computer Science, The University of Hong Kong (advised by Prof. Chuan Wu) 2020
B.Eng. in Computer Science, Wuhan University 2015

Awards

AliStar, Huawei Top Minds, SenseTime AI Pioneer 2019
Lee Shau Kee Postgraduate Fellowship 2016
OutStanding Undergraduate Scholarship 2015
Google Excellence Scholarship 2014
National First Prize in Mathematical Contest in Modeling 2013
First Prize of New Enrollment Scholarship 2011