We innovate system and network architectures that are tailored to the unique demands of AI workloads. Large-scale model training introduces unpredictable computation–communication dynamics that strain conventional system software. Our research develops workload-aware scheduling policies, optimized GPU placement strategies, and high-performance network designs to accelerate distributed AI training and ensure efficient use of costly hardware resources.
Research highlight
Revisiting Traffic Splitting for Software Switch in Datacenter, ACM SIGMETRICS 2025
Intelligent Packet Processing for Performant Containers in IoT, IEEE Internet of Things Journal
TeaVisor: network hypervisor for bandwidth isolation in SDN-NV, IEEE Transactions on Cloud Computing
We harness AI techniques (e.g., LLM, DNN, GNN) to improve the efficiency and resilience of datacenter systems. By learning from system traces, workload patterns, and network telemetry, our models enable intelligent scheduling, adaptive traffic control, and resource orchestration. This AI-driven approach addresses inefficiencies such as GPU underutilization and communication bottlenecks, paving the way for more sustainable and scalable infrastructure.
Research highlight
Machine Learning-Based Prediction Models for Control Traffic in SDN Systems, IEEE Transactions on Service Computing
Control Channel Isolation in SDN Virtualization: A Machine Learning Approach, IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2023)
Predictive Placement of Geo-distributed Blockchain Nodes for Performance Guarantee, IEEE International Conference on Cloud Computing (CLOUD 2024)