Paper accepted in ISCA 2026 : "ParetoES: Hardware-Accelerated Sparse EmbeddingSimilarity via Pareto-Optimal Pruning".
Paper accepted in OSDI 2026 : "Tessera: A Holistic Pipeline Parallelism Framework for Trillion-Parameter Heterogeneous MoE Training (Operational Systems)".
Paper accepted in FSE 2026 : "Hallucinations in LLM-based Code Summarization: Unveiling, Detection, and Mitigation".
Paper accepted in ACM TACO : "gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography".
Paper accepted in HPCA 2025 : "AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning".
Paper accepted in IEEE TC: "RuYi: Optimizing Burst Buffer through Automated, Fine-Grained Process-to-BB Mapping".
One paper accepted in SOSP 2024: "Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with Fractal Tensor"
One paper accepted in ASE 2024: "Towards Understanding the Effectiveness of Large Language Models on Directed Test Input Generation"
Two papers accepted in ACL Findings 2024: "Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback", "Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning"
Paper accepted in IEEE TC: "MMDataLoader: Reusing Preprocessed Data among Concurrent Model Training Tasks"
Paper accepted in EMNLP Findings 2023: "SiMFy: A Simple Yet Effective Approach for Temporal Knowledge Graph Reasoning"
Paper accepted in IEEE TC: "Waterwave: A GPU Memory Flow Engine for Concurrent DNN Training"
Two papers accepted in IJCAI 2023:"Scalable Optimal Margin Distribution Machine", "Incremental and Decremental Optimal Margin Distribution Learning"
Paper accepted in IEEE TPDS: "TurboMGNN: Improving concurrent GNN training tasks on GPU with fine-grained kernel fusion".
Paper accepted in SIGMOD 23: "Data Stream Clustering: An In-depth Empirical Study".
Paper accepted in IEEE TC: "TurboGNN: Improving the End-to-end Performance for Sampling-based GNN Training on GPUs".
Paper accepted in ASPLOS 2023: "GZKP:A GPU Accelerated Zero-Knowledge Proof System".
Won the first prize of CCF Natural Science Awards, 2022 : "Basic Theory and Approaches for High-Performance Big Data Processing".
Paper accepted in IEEE TBD: "Parallel Overlapping Community Detection Algorithm on GPU".
Paper accepted in IEEE TPDS: "LoomIO: Object-Level Coordination in Distributed File Systems".
Paper accepted in SoCC 2020: "ByteSeries : An In-Memory Time Series Database for Large-Scale Monitoring Systems".
Paper accepted in ACM TKDD: "Multi-Stage Network Embedding for Exploring Heterogeneous Edges".
Paper accepted in IEEE TPDS: "Feluca: A Two-Stage Graph Coloring Algorithm with Color-centric Paradigm on GPU".
Paper accepted in IEEE TC: "TurboDL: Improving CNN Training on GPU with Fine-grained Multi-streaming Scheduling".
Paper accepted in ACM TACO : "Optimizing the SSD Burst Buffer by Traffic Detection".
Paper accepted in ICDE 2020 : "Maxson: Reduce duplicate Parsing Overhead on Raw Data".
Paper accepted in ASPLOS 2020 : "Capuchin: Tensor-based GPU Memory Management for Deep Learning".
Paper accepted in ACM TOCS : "Deca: a Garbage Collection Optimizer for In-memory Data Processing".