Search this site
Embedded Files
on the Path to AGI


Haibin Lin

Github | Google Scholar | Linkedin | Twitter 

Last updated: 07/31/2025

Haibin works on LLM infrastructure at Bytedance Seed, focusing on optimizing training framework for LLMs & multimodal models, from pre-training (MegaScale with >10k+ GPUs/run) to post-training (reinforcement learning infra for reasoning with verl). Prior to the LLM era, he was working collective communication libraries for recommendation systems at Bytedance, and Apache MXNet at Amazon (training, inference, runtime, and recipes like gluon-nlp).

Open Source Softwares (Python, C++, CUDA)

  • verl, a reinforcement learning framework for LLMs | #1 contributor, 186 commits

  • BytePS / ps-lite / horovod, distributed training libraries | #3 contributor, 38 commits

  • GluonNLP, a toolkit for natural language processing | #2 contributor, 314 commits

  • Apache MXNet, a deep learning framework | #9 contributor, 215 commits

Papers

Large-scale distributed ML & HPC

  • MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | EuroSys, 2026

  • Robust LLM Training Infrastructure | SOSP 2025 (to appear)

  • Understanding Stragglers in Large Model Training using What-if Analysis | OSDI 2025

  • ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | SIGCOMM, 2025

  • Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | ATC, 2025

  • MegaScale: Scaling Large Language Modeling Training to More Than 10,000 GPUs | NSDI, 2024

  • Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes | Arxiv, 2020

ML toolkits and frameworks

  • HybridFlow: A Flexible and Efficient RLHF Framework | Eurosys 2025

  • dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training | MLSys, 2022

  • GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing | JMLR, 2019

Efficient algorithms 

  • Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression | EuroSys, 2023

  • Temporal-contextual Recommendation in Real Time | KDD (Best Paper Award), 2020

Presentations

  • verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning | @PyTorch 2025, @Stanford CS Seminar

  • Building LLM Training Systems at Scale | CMU 15-642 Machine Learning Systems Lecture 2025

  • Megascale: Scaling Large Language Model Training to More Than 10,000 GPUs | Systems@Scale 2024, @Databricks, NSDI

  • Accelerating recommendation model training using ByteCCL and UCX | UCF 2021

Tutorials

  • Post-training LLMs: From Algorithms to Infrastructure | NeurIPS 2024, ICLR 2025

  • Dive into Deep Learning & From Shallow to Deep Language Representations | EMNLP & JSALT 2019, KDD 2019, KDD 2018

Blogs & Press

  • 最高提升20倍吞吐量!豆包大模型团队发布全新 RLHF 框架,现已开源!| 2024

  • BERT Inference on G4 Instances using Apache MXNet and GluonNLP: 1 Million Requests for 20 Cents | 2020

  • GluonNLP 0.6: Closing the Gap in Reproducible Research with BERT, GluonNLP 0.7.1 — BERT Reloaded | 2019

  • Introducing Dynamic Training for Deep Learning with Amazon EC2 | 2018

Awards & Services

  • Reviewer: AISTATS 2021, VLDB 2023, MLSys 2025 (Area chair), ICLR 2025 (SCI-FM), COLM 2025, NeurIPS 2025

  • KDD Best Paper Award (Applied Science) | 2020

  • Soong Ching Ling Scholarships & Dean's Honors List | 2012 - 2015, HKUEAA Scholarships (Top 0.1%) | 2014

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse