PASA Research - System Optimization for Machine Learning

Memory-Centric System Optimization for Machine Learning Training and Inference

Training machine learning models can be highly memory-intensive or memory-consuming, no matter whether it is on a high-end server or on a mobile device. Optimizing performance of machine learning training (in terms of execution time, memory consumption, or energy consumption) can enable training of larger models or faster convergence of model training. We optimize systems from the memory perspective for high performance machine learning training. Besides training, we optimize inference performance of common machine learning algorithms (e.g., KNN and decision trees) on heterogeneous hardware.

Research Outcome:

[HPCA'24] Jie Ren, Dong Xu, Shuangyan Yang, Jiacheng Zhao, Zhicheng Li, Christian Navasca, Chenxi Wang, Harry Xu, and Dong Li. "Enabling Large Dynamic Neural Network Training with Learning-based Memory Managemen". In 30th International Symposium on High-Performance Computer Architecture
[ASPLOS'23] Shuangyan Yang, Minjia Zhang, Wenqian Dong, and Dong Li. "Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning". In 28th Architectural Support for Programming Languages and Operating Systems
[ATC'22] Xin He, Jianhua Sun, Hao Chen, and Dong Li. "Campo: A Cost-Aware and High-Performance Mixed Precision Optimizer for Neural Network Training". In USENIX Annual Technical Conference
[SEC'21] Jie Liu, Jiawen Liu, Zhen Xie, Xia Ning and Dong Li. Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors. The Sixth ACM/IEEE Symposium on Edge Computing
[VLDB'21] Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation. In 47th International Conference on Very Large Data Bases
[ATC'21] Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li and Yuxiong He. "ZeRO-Offload: Democratizing Billion-Scale Model Training". In 27th USENIX Annual Technical Conference (acceptance rate: 18.8%) (arXiv link) (Media report 1) (Media report 2)
[ICS'21] Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang and Dong Li. "Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators". In 35th International Conference on Supercomputing.
[EuroSys'21] Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu and Dong Li. "Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU". In European Conference on Computer Systems.
[HPCA'21] Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon and Dong Li. "Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning". In 27th IEEE International Symposium on High-Performance Computer Architecture.
[NeurIPS'20] Jie Ren, Minjia Zhang and Dong Li. "HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory ". In 34th Conference on Neural Information Processing Systems.
[USENIX OpML'20] Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos and Dong Li. "RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices". In USENIX Conference on Operational Machine Learning.
[MLSys-W'20] Jie Liu, Jiawen Liu, Zhen Xie and Dong Li. " Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors". In On-Device Intelligence Workshop at Machine Learning and Systems Conference.
[IPDPS'19] Jiawen Liu, Dong Li, Gokcen Kestor, and Jeffrey Vetter. "Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training". In 33rd IEEE International Parallel and Distributed Processing Symposium (acceptance rate: 27.7%)
[ICPADS'19] Jie Liu, Jiawen Liu, Wan Du and Dong Li. "Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2". In 25th IEEE International Conference on Parallel and Distributed Systems (acceptance rate: 28.0%).
[MICRO'18] Jiawen Liu, Hengyu Zhao, Matheus Ogleari, Dong Li, and Jishen Zhao. "Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach". In 51st IEEE/ACM International Symposium on Microarchitecture (acceptance rate: 21.3%).

This research is supported by or under collaboration with:

Page updated

Google Sites

Report abuse