on the Path to AGI

Haibin Lin

Github | Google Scholar | Linkedin | Twitter

Last updated: 07/31/2025

Haibin works on LLM infrastructure at Bytedance Seed, focusing on optimizing training framework for LLMs & multimodal models, from pre-training (MegaScale with >10k+ GPUs/run) to post-training (reinforcement learning infra for reasoning with verl). Prior to the LLM era, he was working collective communication libraries for recommendation systems at Bytedance, and Apache MXNet at Amazon (training, inference, and cooking BERT pre-training recipes with gluon-nlp).

Open Source Softwares (Python, C++, CUDA)

verl, a reinforcement learning framework for LLMs | #1 contributor, 186 commits
BytePS / ps-lite / horovod, distributed training libraries | #3 contributor, 38 commits
GluonNLP, a toolkit for natural language processing | #2 contributor, 314 commits
Apache MXNet, a deep learning framework | #9 contributor, 215 commits

Papers

Large-scale distributed ML & HPC

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | EuroSys, 2026
Robust LLM Training Infrastructure | SOSP 2025
Understanding Stragglers in Large Model Training using What-if Analysis | OSDI 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | SIGCOMM, 2025
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | ATC, 2025
MegaScale: Scaling Large Language Modeling Training to More Than 10,000 GPUs | NSDI, 2024
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes | Arxiv, 2020

ML toolkits and frameworks

HybridFlow: A Flexible and Efficient RLHF Framework | Eurosys 2025
dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training | MLSys, 2022
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing | JMLR, 2019

Efficient algorithms

Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression | EuroSys, 2023
Temporal-contextual Recommendation in Real Time | KDD (Best Paper Award), 2020

Presentations

verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning | @PyTorch 2025, @Stanford CS Seminar
Building LLM Training Systems at Scale | CMU 15-642 Machine Learning Systems Lecture 2025
Megascale: Scaling Large Language Model Training to More Than 10,000 GPUs | Systems@Scale 2024, @Databricks, NSDI
Accelerating recommendation model training using ByteCCL and UCX | UCF 2021

Tutorials

Post-training LLMs: From Algorithms to Infrastructure | NeurIPS 2024, ICLR 2025
Dive into Deep Learning & From Shallow to Deep Language Representations | EMNLP & JSALT 2019, KDD 2019, KDD 2018

Blogs & Press

Awards & Services

Reviewer: AISTATS 2021, VLDB 2023, MLSys 2025 (Area chair), ICLR 2025 (SCI-FM), COLM 2025, NeurIPS 2025
KDD Best Paper Award (Applied Science) | 2020
Soong Ching Ling Scholarships & Dean's Honors List | 2012 - 2015, HKUEAA Scholarships (Top 0.1%) | 2014