Gwangsun Kim

Assistant Professor @ POSTECH

About me

I am an assistant professor at the Department of Computer Science and Engineering, POSTECH. Before joining POSTECH, I worked at Arm on improving Arm processor IPs for server systems. I earned my Ph.D. (2016) and M.S. (2012) degrees from KAIST in Computer Science under Prof. John Kim, and B.S. degree (2010) from POSTECH in Computer Science and Engineering and Electronic and Electrical Engineering (double major).

I am looking for ambitious graduate students and undergraduate interns who want to do innovative research on computer systems (see Research Areas and On-going projects below). If you are interested in working with me, please contact me at g.kim at postech dot ac dot kr.

Employment

Assistant Professor, POSTECH, Nov. 2018 - Present
Senior Research Engineer, Arm Inc., Mar. 2018 - Oct. 2018
Senior Performance Engineer, Arm Inc., Sep. 2016 - Mar. 2018
Research Intern, NVIDIA, Jun. 2015 - Sep. 2015
Research Intern, Samsung Electronics, Jul. 2014 - Sep. 2014

Research Areas

I am interested in various topics in computer architecture and its interaction with different layers of computer system including algorithm, operating system, and programming models. Below are some of the topics I have been working on:

Domain-specific Accelerators for Machine Learning
Near-Data Processing / Processing-In-Memory
HW/SW Co-design
Memory system with Storage-Class Memory
Massively parallel architectures (e.g., GPU)
Large-scale systems (Data centers and supercomputers)
Interconnection networks

On-going Projects

General-purpose Near-Data Processing (NDP) in CXL memory expander for datacenters
- While directly accessing CXL memory from the host CPU results in high latency and low bandwidth, NDP in a CXL memory expander can overcome this challenge. However, application-specific NDP in CXL memory is not suitable for datacenter servers due to the large number of applications that need to be supported. Additionally, using conventional CPU and GPU cores for NDP provides limited efficiency. In this work, we propose a low-overhead, general-purpose NDP in CXL memory to address these challenges.

Architecture for efficient inference of LLM (Large Language Model) and RAG (Retrieval Augmented Generation)
- Efficiently serving LLMs on servers and edge devices poses enormous challenges. In particular, very long contexts of several millions of tokens require huge memory capacity and high bandwidth. Efficiently supporting RAG similarly requires memory system support for large vector databases and knowledge graphs. We build on our general-purpose NDP architecture for CXL memory to efficiently support these applications and overcome the limitations of current systems based on GPUs and NPUs.

Scalable Neural Processing Unit (NPU) system architecture
- Datacenters that serve a massive amount of machine learning service requests require that the NPUs have a good scalability at the chip-level (with many NPU cores), package-level (with multiple dies in a package), node-level (with multiple NPU cards), and rack-level (with multiple NPU nodes in a rack). This project explores software/hardware co-designing approaches to enable a very scalable NPU system for large-scale deep learning systems.

Lossless tensor compression for high-performance DNN inference/training
- DNN inference/training requires very high memory capacity and bandwidth. Meanwhile, the inherent redundancy and sparsity in DNN's tensors pose an opportunity for significantly reducing the size of tensors to increase the effective memory capacity and bandwidth. In this project, we are working on developing an effective hardware-based tensor compression algorithm to improve the overall system performance and energy-efficiency for DNN inference and training.

Students

Ph.D./Integrated MS-PhD Students

Hyungkyu Ham
Jeongmin Hong

M.S. Student

Wonhyuk Yang
Geonwoo Park
Yunseon Shin

Jinhoon Bae
Okkyun Woo

Researcher

Heeeon Lee

Alumni

Junkyung Choi (M.S., 2021, first employment: Nota AI)
Junho Lee (M.S., 2022, first employment: OPENEDGES Technology, Inc.)

Publications

2024

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
- Hyungkyu Ham*, Jeongmin Hong*, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim
- To appear in the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO) (Accept. rate: 22.7%) [*: Equal contribution]
- [ arXiv ]

ONNXim: A Fast, Cycle-level Multi-core NPU Simulator
- Hyungkyu Ham*, Wonhyuk Yang*, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim
- Preprint [*: Equal contribution]
- [ arXiv ] [ Github ]

CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPU
- Shinnung Jeong, Sungjun Cho, Yongwoo Lee, Hyunjun Park, Seonyeong Heo, Gwangsun Kim, Youngsok Kim, Hanjun Kim
- To appear in the 53rd International Conference on Parallel Processing (ICPP)

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
- Guseul Heo, Sangyeop Lee, Jaehong Cho, Hyunmin Choi, Sanghyeon Lee, Hyungkyu Ham, Gwangsun Kim, Divya Mahajan, Jongse Park
- ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
- [ ACM DL ] [ arXiv ]

Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
- Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong, Gwangsun Kim
- The 30th International Symposium on High-Performance Computer Architecture (HPCA) (Accept. rate: 18.3%)
- [ IEEE Xplore ] [ arXiv ] [ Slides ] [ Github ]

2022

Overcoming Memory Capacity Wall of GPUs with Heterogeneous Memory Stack
- Jeongmin Hong*, Sungjun Cho*, Gwangsun Kim
- IEEE Computer Architecture Letters [*: Equal contribution]
- [ Paper ]

Dynamic Global Adaptive Routing in High-radix Networks
- Hans Kasan, Gwangsun Kim, Yung Yi, John Kim
- The 49th International Symposium on Computer Architecture (ISCA) (Accept. rate: 16.8%)
- [ Paper ]

2021

Near-Data Processing in Memory Expander for DNN Acceleration on GPUs
- Hyungkyu Ham*, Hyunuk Cho*, Minjae Kim, Jueon Park, Jeongmin Hong, Hyojin Sung, Eunhyeok Park, Euicheol Lim, Gwangsun Kim
- IEEE Computer Architecture Letters [*: Equal contribution]
- [ Paper ]

2018

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks
- Gwangsun Kim, Hayoung Choi, and John Kim
- The 45th International Symposium on Computer Architecture (ISCA) (Accept. rate: 16.9%)
- [ Paper ] [ Slides ]

2017

Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs
- Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, and Kevin Hsieh
- The International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (Accept. rate: 18.7%)
- [ Paper ] [ Slides ]

History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers
- Wonjun Song, Gwangsun Kim, Hyungjoon Jung, Jongwook Chung, Jung Ho Ahn, Jae W Lee, and John Kim
- The 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Accept. rate: 17.4%)
- [ Paper ]

2016

Contention-based Congestion Management in Large-Scale Networks
- Gwangsun Kim, Changhyun Kim, Jiyun Jeong, Mike Parker, and John Kim
- The 49th IEEE/ACM International Symposium on Microarchitecture (MICRO) (Accept. rate: 21.6%)
- [ Paper ] [ Slides ]

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs
- Gwangsun Kim, Jiyun Jeong, John Kim, and Mark Stephenson
- The 25th International Conference on Parallel Architectures and Compilation Techniques (PACT) (Accept. rate: 26.1%)
- [ Paper ] [ Slides ]

Accelerating Linked-list Traversal through Near-Data Processing
- Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, and John Kim
- The 25th International Conference on Parallel Architectures and Compilation Techniques (PACT) (Accept. rate: 26.1%)
- Nominated for the Best Paper Award
- [ Paper ] [ Slides ]

High-Throughput System Design with Memory Networks
- Gwangsun Kim
- Ph.D. Thesis, School of Computing, KAIST
- [ Paper ]

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
- Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O’Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler
- The 43rd International Symposium on Computer Architecture (ISCA) (Accept. rate: 19.6%)
- [ Paper ] [ Slides ]

iPAWS : Instruction-Issue Pattern-based Adaptive Warp Scheduling for GPGPUs
- Minseok Lee, Gwangsun Kim, John Kim, Woong Seo, Yeongon Cho, and Soojung Ryu
- The 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 22.1%)
- [ Paper ]

Design and Analysis of Hybrid Flow Control for Hierarchical Ring Network-on-Chip
- Hanjoon Kim, Gwangsun Kim, Hwasoo Yeo, John Kim, and Seungryoul Maeng
- IEEE Transactions on Computers, vol. 65, no. 2, pp. 480-494, 1 Feb. 2016
- [ Paper ]

2015

Overcoming Far-end Congestion in Large-Scale Networks
- Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, and Steve Scott
- The 21st IEEE International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 22.1%)
- [ Paper ]

2014

Multi-GPU System Design with Memory Networks
- Gwangsun Kim, Minseok Lee, Jiyun Jeong, and John Kim
- The 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (Accept. rate: 19.4%)
- [ Paper ] [ Slides ]

Memory Network: Enabling Technology for Scalable Near-Data Computing
- Gwangsun Kim, John Kim, Jung Ho Ahn, and Yongkee Kwon
- The 2nd Workshop on Near-Data Processing (WoNDP) (in conjunction with MICRO-47)
- [ Paper ] [ Slides ]

Transportation-Network Inspired Network-on-Chip
- Hanjoon Kim, Gwangsun Kim, Hwasoo Yeo, Seungryoul Maeng, and John Kim
- The 20th International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 25.6%)
- [ Paper ]

Low-overhead Network-on-Chip Support for Location-oblivious Task Placement
- Gwangsun Kim, Michael M. Lee, John Kim, Jae W. Lee, Dennis Abts, and Michael Marty
- IEEE Transactions on Computers, vol. 63, no. 6, pp. 1487-1500, June 2014
- [ Paper ]

2013

Memory-centric System Interconnect Design with Hybrid Memory Cubes
- Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim
- The 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (Accept. rate: 17.3%)
- Best Paper Award
- [ Paper ] [ Slides ]

2012

Scalable On-chip Network in Power Constrained Manycore Processors
- Hanjoon Kim, Gwangsun Kim, and John Kim
- The 3rd International Green Computing Conference (IGCC)
- [ Paper ]

2011

FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
- Gwangsun Kim, John Kim, and Sungjoo Yoo
- The 48th ACM/EDAC/IEEE Design Automation Conference (DAC) (Accept. rate: 22%)
- [ Paper ] [ Slides ]

Academic Services

Program Committee

- International Symposium on High-Performance Computer Architecture (HPCA): 2024, 2025
- International Conference on Supercomputing (ICS): 2024
- International Parallel & Distributed Processing Symposium (IPDPS): 2023
- International Symposium on Networks-on-Chip (NOCS): 2020, 2021, 2022, 2023
- International Conference on High Performance Computing, Data, and Analytics (HiPC): 2020
- International Conference on Computer Design (ICCD): 2017, 2019

External Review Committee

- Annual International Symposium on Computer Architecture (ISCA): 2022, 2024

Teaching

CSED503: Advanced Computer Architecture, POSTECH, Fall 2020, 2021, 2023
CSED311: Computer Architecture, POSTECH, Spring 2019, 2020, 2021, 2022, 2023
CSED490V: Parallel Architecture and Programming, POSTECH, Fall 2019, 2022
CSED499: Research Project, POSTECH, Spring 2019
CSED199: Freshman Research Participation, Fall 2021

Contact

Email: g.kim at postech dot ac dot kr
Phone: +82-54-279-2260
Office: POSTECH R4 (연구4동) #4411, 67 Cheongam-ro, Nam-gu, Pohang, Gyungbuk, Korea 37673