Gwangsun Kim
Assistant Professor @ POSTECH
About me
I am an assistant professor at the Department of Computer Science and Engineering, POSTECH. Before joining POSTECH, I worked at Arm on improving Arm processor IPs for server systems. I earned my Ph.D. (2016) and M.S. (2012) degrees from KAIST in Computer Science under Prof. John Kim, and B.S. degree (2010) from POSTECH in Computer Science and Engineering and Electronic and Electrical Engineering (double major).
I am looking for ambitious graduate students and undergraduate interns who want to do innovative research on computer systems (see Research Areas and On-going projects below). If you are interested in working with me, please contact me at g.kim at postech dot ac dot kr.
Related links:
Employment
Assistant Professor, POSTECH, Nov. 2018 - Present
Senior Research Engineer, Arm Inc., Mar. 2018 - Oct. 2018
Senior Performance Engineer, Arm Inc., Sep. 2016 - Mar. 2018
Research Intern, NVIDIA, Jun. 2015 - Sep. 2015
Research Intern, Samsung Electronics, Jul. 2014 - Sep. 2014
Research Areas
I am interested in various topics in computer architecture and its interaction with different layers of computer system including algorithm, operating system, and programming models. Below are some of the topics I have been working on:
Domain-specific Accelerators for Machine Learning
Near-Data Processing / Processing-In-Memory
HW/SW Co-design
Memory system with Storage-Class Memory
Massively parallel architectures (e.g., GPU)
Large-scale systems (Data centers and supercomputers)
Interconnection networks
On-going Projects
General-purpose Near-Data Processing (NDP) in CXL memory expander for datacenters
While directly accessing CXL memory from the host CPU results in high latency and low bandwidth, NDP in a CXL memory expander can overcome this challenge. However, application-specific NDP in CXL memory is not suitable for datacenter servers due to the large number of applications that need to be supported. Additionally, using conventional CPU and GPU cores for NDP provides limited efficiency. In this work, we propose a low-overhead, general-purpose NDP in CXL memory to address these challenges.
Architecture for efficient inference of LLM (Large Language Model) and RAG (Retrieval Augmented Generation)
Efficiently serving LLMs on servers and edge devices poses enormous challenges. In particular, very long contexts of several millions of tokens require huge memory capacity and high bandwidth. Efficiently supporting RAG similarly requires memory system support for large vector databases and knowledge graphs. We build on our general-purpose NDP architecture for CXL memory to efficiently support these applications and overcome the limitations of current systems based on GPUs and NPUs.
Scalable Neural Processing Unit (NPU) system architecture
Datacenters that serve a massive amount of machine learning service requests require that the NPUs have a good scalability at the chip-level (with many NPU cores), package-level (with multiple dies in a package), node-level (with multiple NPU cards), and rack-level (with multiple NPU nodes in a rack). This project explores software/hardware co-designing approaches to enable a very scalable NPU system for large-scale deep learning systems.
Lossless tensor compression for high-performance DNN inference/training
DNN inference/training requires very high memory capacity and bandwidth. Meanwhile, the inherent redundancy and sparsity in DNN's tensors pose an opportunity for significantly reducing the size of tensors to increase the effective memory capacity and bandwidth. In this project, we are working on developing an effective hardware-based tensor compression algorithm to improve the overall system performance and energy-efficiency for DNN inference and training.
Students
Ph.D./Integrated MS-PhD Students
Hyungkyu Ham
Jeongmin Hong
M.S. Student
Wonhyuk Yang
Geonwoo Park
Yunseon Shin
Jinhoon Bae
Okkyun Woo
Researcher
Heeeon Lee
Alumni
Junkyung Choi (M.S., 2021, first employment: Nota AI)
Junho Lee (M.S., 2022, first employment: OPENEDGES Technology, Inc.)
Publications
2024
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
Hyungkyu Ham*, Jeongmin Hong*, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim
Preprint [*: Equal contribution]
[ arXiv ]
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong, Gwangsun Kim
The 30th International Symposium on High-Performance Computer Architecture (HPCA) (Accept. rate: 18.3%)
[ IEEE Xplore ] [ arXiv ] [ Slides ] [ Github ]
2022
Overcoming Memory Capacity Wall of GPUs with Heterogeneous Memory Stack
Jeongmin Hong*, Sungjun Cho*, Gwangsun Kim
IEEE Computer Architecture Letters [*: Equal contribution]
[ Paper ]
Dynamic Global Adaptive Routing in High-radix Networks
Hans Kasan, Gwangsun Kim, Yung Yi, John Kim
The 49th International Symposium on Computer Architecture (ISCA) (Accept. rate: 16.8%)
[ Paper ]
2021
Near-Data Processing in Memory Expander for DNN Acceleration on GPUs
Hyungkyu Ham*, Hyunuk Cho*, Minjae Kim, Jueon Park, Jeongmin Hong, Hyojin Sung, Eunhyeok Park, Euicheol Lim, Gwangsun Kim
IEEE Computer Architecture Letters [*: Equal contribution]
[ Paper ]
2018
TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks
2017
Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs
History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers
Wonjun Song, Gwangsun Kim, Hyungjoon Jung, Jongwook Chung, Jung Ho Ahn, Jae W Lee, and John Kim
The 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Accept. rate: 17.4%)
[ Paper ]
2016
Contention-based Congestion Management in Large-Scale Networks
Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs
Accelerating Linked-list Traversal through Near-Data Processing
High-Throughput System Design with Memory Networks
Gwangsun Kim
Ph.D. Thesis, School of Computing, KAIST
[ Paper ]
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
iPAWS : Instruction-Issue Pattern-based Adaptive Warp Scheduling for GPGPUs
Minseok Lee, Gwangsun Kim, John Kim, Woong Seo, Yeongon Cho, and Soojung Ryu
The 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 22.1%)
[ Paper ]
Design and Analysis of Hybrid Flow Control for Hierarchical Ring Network-on-Chip
Hanjoon Kim, Gwangsun Kim, Hwasoo Yeo, John Kim, and Seungryoul Maeng
IEEE Transactions on Computers, vol. 65, no. 2, pp. 480-494, 1 Feb. 2016
[ Paper ]
2015
Overcoming Far-end Congestion in Large-Scale Networks
Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, and Steve Scott
The 21st IEEE International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 22.1%)
[ Paper ]
2014
Multi-GPU System Design with Memory Networks
Memory Network: Enabling Technology for Scalable Near-Data Computing
Transportation-Network Inspired Network-on-Chip
Hanjoon Kim, Gwangsun Kim, Hwasoo Yeo, Seungryoul Maeng, and John Kim
The 20th International Symposium on High Performance Computer Architecture (HPCA) (Accept. rate: 25.6%)
[ Paper ]
Low-overhead Network-on-Chip Support for Location-oblivious Task Placement
Gwangsun Kim, Michael M. Lee, John Kim, Jae W. Lee, Dennis Abts, and Michael Marty
IEEE Transactions on Computers, vol. 63, no. 6, pp. 1487-1500, June 2014
[ Paper ]
2013
Memory-centric System Interconnect Design with Hybrid Memory Cubes
2012
Scalable On-chip Network in Power Constrained Manycore Processors
Hanjoon Kim, Gwangsun Kim, and John Kim
The 3rd International Green Computing Conference (IGCC)
[ Paper ]
2011
Teaching
CSED503: Advanced Computer Architecture, POSTECH, Fall 2020, 2021, 2023
CSED311: Computer Architecture, POSTECH, Spring 2019, 2020, 2021, 2022, 2023
CSED490V: Parallel Architecture and Programming, POSTECH, Fall 2019, 2022
CSED499: Research Project, POSTECH, Spring 2019
CSED199: Freshman Research Participation, Fall 2021
Contact
Email: g.kim at postech dot ac dot kr
Phone: +82-54-279-2260
Office: POSTECH R4 (연구4동) #4411, 67 Cheongam-ro, Nam-gu, Pohang, Gyungbuk, Korea 37673