Associate Professor
Department of Computer Science
Graduate School of Artificial Intelligence
Yonsei University, Seoul, Korea
EDUCATION
05/2009 – 08/2013 Ph.D. in Electrical Engineering, University of Michigan, Ann Arbor, MI
(Advised by Prof. Scott Mahlke)
09/2007 – 04/2009 M.S.E in Electrical Engineering, University of Michigan, Ann Arbor, MI
03/1999 – 02/2007 B.S. in Electronic and Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Korea
EXPERIENCE
09/2022 - Present Associate Professor at Yonsei University, Seoul, Korea
03/2021 - 08/2022 Associate Professor at Hanyang University, Seoul, Korea
03/2017 - 02/2021 Assistant Professor at Hanyang University, Seoul, Korea
09/2014 - 02/2017 Assistant Professor at Hongik University, Seoul, Korea
05/2013 – 08/2014 Software Architect at Intel, Santa Clara, CA
01/2002 - 01/2005 Researcher and Engineer at ALTECH (Airlink Technology), Seoul, Korea
SERVICES
Organizing Committee
2022 Treasurer & Finance Co-Chair: CGO’22
2021 Treasurer & Finance Co-Chair: CGO’21
Program Committee Chair
2025 LCTES’25
Program Committee
09/2014 - Present CGO’25/24/22/21, PACT’25, HPCA’25, LCTES’25/24/23/20, ICCQ’22, ICCD’17
External Reviewer
09/2014 - Present MICRO’25, KDD’23, ASPLOS’20, DAC’19, DAC’18
SELECTED PUBLICATIONS (see full publications: dblp, Google Scholar )
PIM-CCA: An Efficient PIM Architecture with Optimized Integration of Configurable Functional Units, MICRO 2025
Efficient Image Super-Resolution Using Dynamic Quality Control with Recursive Model Structures, IEEE Access 2025
SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems, ICS 2025
PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM, ICS 2025
CUrator: An Efficient LLM Execution Engine with Optimized Integration of CUDA Libraries, CGO 2025
Accelerating LLMs using an Efficient GEMM Library and Target-Aware Optimizations on Real-World PIM Devices, CGO 2025
Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU, LCTES 2024
Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-core NPUs, DATE 2024
ISP Agent: A Generalized In-Storage-Processing Workload Offloading Framework by Providing Multiple Optimization Opportunities, ACM TACO 2023
Tailoring Tiling-based GEMM Performance using Supervised Learning, ICCD 2023
Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework on Multi-DPU PIM Architecture, PACT 2023
SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication, CIKM 2023
Synchronization-aware NAS for an Efficient Collaborative Inference on Mobile Platforms, LCTES 2023
Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems, ICDE 2023
Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation, DATE 2023
Dynamic Rate Neural Acceleration Using Multiprocessing Mode Support, IEEE TVLSI 2022
Legion: Tailoring Grouped Neural Execution Considering Heterogeneity on Multiple Edge Devices, ICCD 2021
Convergence-Aware Neural Network Training, DAC 2020
Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance, DAC 2020
Optimization of a GPU-based Sparse Matrix Multiplication for Large Sparse Networks, ICDE 2020
PreScaler: An Efficient System-aware Precision Scaling Framework on Heterogeneous Systems, CGO 2020
GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures, DAC 2019
Improving GPU Multitasking Efficiency using Dynamic Resource Sharing, IEEE CAL 2018
NN Compactor: Minimizing Memory and Logic Resources for Small Neural Networks, DATE 2018
Dynamic Resource Management for Efficient Utilization of Multitasking GPUs, ASPLOS 2017
A Bypass First Policy for Energy-Efficient Last Level Caches, SAMOS 2016
APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs, ISCA 2016
ELF: Maximizing Memory-level Parallelism for GPUs with Coordinated Warp and Fetch Scheduling, SC 2015
Fine Grain Cache Partitioning using Per-Instruction Working Blocks, PACT 2015
Chimera: Collaborative Preemption for Multitasking on a Shared GPU. ASPLOS 2015
Enabling Efficient Alias Speculation. LCTES 2015
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. PACT 2013
SIMD defragmenter: efficient ILP realization on data-parallel architectures. ASPLOS 2012
Process variation in near-threshold wide SIMD architectures. DAC 2012
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability. MICRO 2012
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. MICRO 2009