Haibin works on machine learning systems at Bytedance, focusing on optimizing DL frameworks for large scale training & LLMs. He previously worked with Yibo Zhu and Chuanxiong Guo. Prior to Bytedance, he works on ML system and natural language processing at Amazon Web Services, with a team led by Mu Li and Alex Smola. He finished his M.S. in Computer Science at Carnegie Mellon University, advised by Andy Pavlo. Prior to that, he received a B.Eng. in Computer Science from University of Hong Kong and Shanghai Jiao Tong University jointly.
Papers
Peer Reviewed Papers
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs | EuroSys 2024
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies | EuroSys 2023
SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training | NeurIPS 2022
dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training | MLSys 2022
ResNeSt: Split-Attention Networks | CVPR (Efficient Deep Learning for CV) 2022
Temporal-contextual Recommendation in Real Time | KDD (Best Paper Award) 2020
CSER: Communication-efficient SGD with Error Reset | NeurIPs 2020
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates | NeurIPS (Optimization for ML) 2020
Is Network the Bottleneck for Distributed Training? | SIGCOMM (Network Meets AI & ML) 2020
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing | JMLR 2019
Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs | ICLR (Representation Learning for Graph and Manifolds) 2019
Just-in-Time Dynamic-Batching | NeurIPS (Systems for Machine Learning) 2018
Self-Driving Database Management Systems | CIDR 2017
Preprints
Softwares
veGiantModel, a library for giant model training with 3-D parallelism | author, 2020
GluonNLP, a toolkit for natural language processing | co-author, 2018
ps-lite & BytePS, a distributed training library for deep learning | core dev, 2017
Apache MXNet, a deep learning framework | PPMC & committer, 2016
Peloton, a research prototype for self-driving database management system | main contributor, 2015
Presentations
BytePS and ByteCCL for distributed training | Invited talk @ Meta 2022
Accelerating recommendation model training using ByteCCL and UCX | UCF 2021
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet | NVIDIA GTC 2020
Amazon SageMaker and Apache MXNet: Tips & Tricks | AWS Re:invent 2019
Build State-of-the-art NLP Models with Amazon SageMaker and GluonNLP | AWS Re:invent 2019
Sparse Tensor for Large-scale Recommendation Systems and Natural Language Processing | Apache MXNet Summit 2018
Tutorials
Dive into Deep Learning for Natural Language Processing | EMNLP 2019
Everything You Need to Know to Reproduce SOTA Deep Learning Models from Hands-on Tutorial | ICCV 2019
From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond | KDD 2019
Dive into Deep Learning for Natural Language Processing | JSALT 2019
Deep Learning and Natural Language Processing with Apache MXNet Gluon | KDD 2018
Blogs & Press
字节跳动开源大模型训练框架veGiantModel | 2021
BERT Inference on G4 Instances using Apache MXNet and GluonNLP: 1 Million Requests for 20 Cents | 2020
Amazon Scientists Help SK Telecom Create Korean-based Natural Language Processor | 2020
GluonNLP 0.6: Closing the Gap in Reproducible Research with BERT | 2019
Introducing Dynamic Training for Deep Learning with Amazon EC2 | 2018
Apache MXNet Release Adds Support for New NVIDIA Volta GPUs and Sparse Tensor | 2017
Patents
用于加速分布式DNN训练的通用分析和优化系统
基于梯度延迟感知的数据并行深度神经网络训练流水线
跨模型跨设备张量程序性能预测器
Selected Awards
KDD Best Paper Award | 2020
Soong Ching Ling Scholarships | 2011 - 2015
Dean's Honors List | 2012 - 2015
HKUEAA Scholarships (Top 0.1%) | 2014