[SEC'25] FedDES: Discrete Event Based Performance Simulation for Federated Learning Systems
Zhonghao Chen, Weicong Chen, Duo Zhang, Kibaek Kim, Guanpeng Li, Sheng Di, and Xiaoyi Lu
Proceedings of the ACM/IEEE Symposium on Edge Computing, 2025.
[Paper]
[TPDS'25] FedEFsz: Fair Cross-Silo Federated Learning System with Error-Bounded Lossy Compression
Zhaorui Zhang, Sheng Di, Benben Liu, Zhuoran Ji, Guanpeng Li, Xiaoyi Lu, Amelie Chi Zhou, Khalid Ayed Alharthi, Jiannong Cao
IEEE Transactions on Parallel and Distributed Systems
[Paper]
[SC'25] HPC-R1: Characterizing R1-like Large Reasoning Models on HPC
Adam Weingram*, Duo Zhang*, Zhonghao Chen, Hao Qi, and Xiaoyi Lu
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2025. (* made equal contributions)
[Paper]
[SC'25] DPAR: High-Performance, Secure, and Scalable Differential Privacy-based AllReduce
Hao Qi, Weicong Chen, Chenghong Wang, and Xiaoyi Lu
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2025.
[Paper]
[SC'25] GPU Lossy Compression for HPC Can Be Versatile and Ultra-Fast
Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2025.
[Paper]
[SC'25] lsCOMP: Efficient Light Source Compression
Yafan Huang, Sheng Di, Robert Underwood, Peco Myint, Miaoqi Chu, Guanpeng Li, Nicholas Schwarz, Franck Cappello
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2025.
[Paper]
[HPDC'25] DPU-KV: On the Benefits of DPU Offloading for In-Memory Key-Value Stores at the Edge
Arjun Kashyap, Yuke Li, and Xiaoyi Lu
Proceedings of International ACM Symposium on High Performance and Distributed Computing (HPDC), 2025.
[Paper]
[ICS'25] Understanding the Idiosyncrasies of Emerging BlueField DPUs
Arjun Kashyap, Yuke Li, Darren Ng, and Xiaoyi Lu
Proceedings of the 39th International Conference on Supercomputing (ICS), 2025.
[Paper]
[ICS'25] GHCL: Advancing GPU-aware Collective Communications with Homomorphic Compression
Jiajun Huang, Sheng Di, Yafan Huang, Zizhong Chen, Franck Cappello, Yanfei Guo, and Rajeev Thakur
In Proceedings of the 39th International Conference on Supercomputing (ICS), 2025.
[Paper]
[FHPC’25] A Definition and Taxonomy of Digital Twins: Case Studies with Machine Learning and Scientific Applications
Adam Weingram, Carolyn Cui, Stephanie Lin, Samuel Munoz, Toby Jacob, Joshua Viers, and Xiaoyi Lu
Frontiers in High Performance Computing, 2025.
[Paper]
[SC'24] hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Zizhe Jian, Xin Liang, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2024
[Paper]
[SC'24] Versatile Datapath Soft Error Detection on the Cheap for HPC Applications
Yafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2024
[Paper]
[SC'24] cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio
Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2024 (Best Paper finalist)
[Paper]
[ICS’24] gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
Jiajun Huang, Sheng Di, Xiaodong Yu, Zhaiyu Jia, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, and Rajeev Thakur
Proceedings of the 38th International Conference on Supercomputing (ICS), 2024
[Paper]
[IJCAI’24] FedFa: A Fully Asynchronous Training Paradigm for Federated Learning
Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, and Jiannong Cao
Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024
[Paper]
[IPDPS'24] NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support
Darren Ng, Andrew Lin, Arjun Kashyap, Guanpeng Li, Xiaoyi Lu
Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024
[Paper]
[IPDPS'24] Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures
Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu
Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024 (Best Paper Award Nomination)
[Paper]
[IPDPS'24] An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, and Rajeev Thakur
Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024
[Paper]
[IPDPS'24] DRUTO: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications
Md Hasanur Rahman, Sheng Di, Shengjian Guo, Xiaoyi Lu, Guanpeng Li, and Franck Cappello
Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024
[Paper]
[IPDPS’24 Poster & ICDCS’24] Efficient Communication in Federated Learning Using Floating-Point Lossy Compression
Grant Wilkins, Sheng Di, Jon Calhoun, Zilinghan Li, Kibaek Kim, Robert Underwood, Richard Mortier, and Franck Cappello
Proceedings of the 44th IEEE International Conference on Distributed Computing Systems (ICDCS), 2024
[Paper]
[TPDS’24] ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy Compression
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Khalid Alharthi, Zizhong Chen, Franck Cappello, Yanfei Guo, and Rajeev Thakur
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2024 (Under Revision)
[Paper]
[IEEE Micro'24] High-Speed Data Communication with Advanced Networks in Large Language Model Training
Liuyao Dai, Hao Qi, Weicong Chen, Xiaoyi Lu
IEEE Micro, 2024
[Paper]
[IEEE Micro'23] Compression Analysis for BlueField-2/3 DPUs: Lossy and Lossless Perspectives
Yuke Li, Arjun Kashyap, Yanfei Guo, and Xiaoyi Lu
IEEE Micro, 2023
[Paper]
[SC'23] Characterizing One-/Two-sided Designs in OpenSHMEM Collectives
Yuke Li, Yanfei Guo, Xiaoyi Lu
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023 (Research Poster Paper)
[Paper]
[SC'23] Early Experience in Characterizing Training Large Language Models on Modern HPC Clusters
Hao Qi, Liuyao Dai, Weicong Chen, Xiaoyi Lu
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023 (Research Poster Paper)
[Paper]
[SC'23] An Early Case Study with Multi-Tenancy Support in SPDK’s NVMe-over-Fabric Designs
Darren Ng, Charles Parkinson, Andrew Lin, Arjun Kashyap, Xiaoyi Lu
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023 (Research Poster Paper)
[Paper]