Research

We design both "hardware" and "software" to make AI systems faster and more efficient!

Research Area

Computer architecture and system

AI accelerators (GPU, NPU, and PIM)

Hardware-software co-design for AI/ML

Recent Topics

Currently, we are designing "LLM / physical AI / agentic AI systems" for both "datacenters" and "edge devices."

AIgorithm-system co-design for large language models (LLMs) on GPU systems

LLMs have advanced significantly, but they present substantial memory challenges. To address this, we propose an algorithm-system co-design that overcomes these memory challenges while improving inference performance. This approach reduces GPU memory usage and maintains model accuracy, making it feasible to efficiently deploy large-scale LLMs on a single GPU.

Related works: [ASPLOS'24] [ISCA'24][HPCA'26]

Hardware accelerator (NPU) design for graph neural networks (GNNs)

GNNs have emerged as a key technology in application domains where the input data is relational. However, their reliance on sparse matrix multiplication leads to inefficient data movement, resulting in significant performance bottlenecks. To address this, we present an NPU accelerator based on a row-wise product, co-designing hardware and software to balance locality and parallelism in GNNs. This approach achieves substantial energy-efficiency improvements compared to state-of-the-art NPU accelerators.

Related works: [MICRO'22] [CAL'23][HPCA'23]

Computer system design for deep learning based recommendation systems (DLRM)

Personalized recommendations power major applications such as ads, videos, and e-commerce. However, recommendation systems face two key performance bottlenecks: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. To address these challenges, we propose a chiplet-based hybrid accelerator that overcomes both the memory throughput limitations and the compute demands. We implement and evaluate our design on Intel HARPv2, a package-integrated CPU+FPGA device, achieving significant speedups and energy-efficiency improvements.

Related works: [ISCA'20] [ISCA'25]

Page updated

Google Sites

Report abuse