Publications-International

2026

Conference/Symposium/Workshop Proceedings

Efficient Data Processing using On-the-Fly Host-PIM Interactions in a Commodity PIM System

Hyojune Kim, Jeonghyeon Joo, Taehyeong Park, Yongjun Park, Hyuck Han, and Sooyong Kang

IEEE International Conference on Data Engineering (ICDE), May. 2026 (BK21+ IF 3) (To appear)

Compiler and System Optimizations for Gem5 Simulator

Haneul Park*, Siddharth Agarwal*, Pradyun Narkadamilli, Kiung Jung, Yongjun Park, Ipoom Jeong, and Nam Sung Kim

IEEE International Symposium on Performance Analysis of System and Software (ISPASS) , Apr 2026 (BK21+ IF 1) (To appear)

(*Equal contribution)

FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks

Jaemin Kim, Hongjun Um, Sungkyun Kim, Yongjun Park, Jiwon Seo

Proc. 2026 European Conference on Computer Systems (EuroSys), April. 2026 (BK21+ IF 2) (To appear)

Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device Inference (web, paper)

Yeonoh Jeong, Taehyeong Park, Yongjun Park

Proc. 2026 Intl. Symposium on Code Generation and Optimization (CGO), Feb. 2026 (BK21+ IF 2)

Journal/Magazine Articles

Peak-Memory-aware Partitioning and Scheduling for Multi-tenant DNN Model Inference

Jaeho Lee, Ju Min Lee, Haeeun Jeong, Hyunho Kwon, Youngsok Kim, Yongjun Park, and Hanjun Kim

Journal of Systems Architecture, Mar. 2026

2025

Conference/Symposium/Workshop Proceedings

An Efficient PIM-Based Graph Engine on a Single Machine

Myung-Hwan Jang, Min-Kyeong Shin, Taehyeong Park, Yongjun Park, and Sang-Wook Kim

2025 ACM International Conference on Information and Knowledge Management (CIKM) (Short paper), November 2025 (BK21+ IF 3)

PIM-CCA: An Efficient PIM Architecture with Optimized Integration of Configurable Functional Units (web, paper)

Jeehyun Kim, Donghyeon Kim, Seokwon Kang, Bongjoon Hyun, Inho Lee, Yongjun Park

Proc. 2025 Intl. Symposium on Microarchitecture (MICRO), Oct. 2025 (BK21+ IF 4)

SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems (web, web(tentative), paper)

Seok Namkoong, Taehyeong Park, Kiung Jung, Jinyoung Kim and Yongjun Park

39th ACM International Conference on Supercomputing (ICS), June 2025 (BK21+ IF 2)

PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM (web, web(tentative), paper)

Inyong Hwang*, Donghyeon Kim*, Seokwon Kang, Taehyeong Park, Taehoon Kim, Jiwon Seo, Hanjun Kim, Youngsok Kim, and Yongjun Park

39th ACM International Conference on Supercomputing (ICS), June 2025 (BK21+ IF 2)

(*Equal contribution)

Supporting Register-based Addressing Modes for in-DRAM PIM ISAs

Seok Young Kim, Byung Ho Choi, Seokwon Kang, Yongjun Park and Seon Wook Kim

Proc. 62th Design Automation Conference (DAC), June. 2025 (BK21+ IF 3)

SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs

Suhyun Lee, Chaemin Lim, Jinwoo Choi, Heelim Choi, Chan Lee, Yongjun Park, Kwanghyun Park, Hanjun Kim, and Youngsok Kim

2025 ACM International Conference on Management of Data (SIGMOD), June. 2025 (BK21+ IF 4)

CUrator: An Efficient LLM Execution Engine with Optimized Integration of CUDA Libraries (web, paper) (Artifacts Evaluated)

Yoon Noh Lee, Yongseung Yu, and Yongjun Park

Proc. 2025 Intl. Symposium on Code Generation and Optimization (CGO), March . 2025 (BK21+ IF 2)

Accelerating LLMs using an Efficient GEMM Library and Target-Aware Optimizations on Real-World PIM Devices (web, paper)

Hyeoncheol Kim*, Taehoon Kim*, Taehyeong Park, Donghyeon Kim, Yongseung Yu, Hanjun Kim, and Yongjun Park

Proc. 2025 Intl. Symposium on Code Generation and Optimization (CGO), March . 2025 (BK21+ IF 2)

(*Equal contribution)

Journal/Magazine Articles

Efficient Image Super-Resolution Using Dynamic Quality Control with Recursive Model Structures (web, paper)

Inho Lee, Jaemin Park, Seunghwan Lee, Tae Hyun Kim, Jiwon Seo, Hunjun Lee, Yongjun Park

IEEE Access, Volume: 13, June, 2025

2024

Conference/Symposium/Workshop Proceedings

Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU (web, paper)

Kiung Jung, Seok Namkoong, Hongjun Um, Hyejun Kim, Youngsok Kim, and Yongjun Park

25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2024 (BK21+ IF 2)

Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-core NPUs (web, paper)

Younghyun Lee, Hyejun Kim, Yongseung Yu, Myeongjin Cho, Jiwon Seo and Yongjun Park

Proc. Design Automation and Test in Europe (DATE), March, 2024 (BK21+ IF 2)

Journal/Magazine Articles

ISP Agent: A Generalized In-Storage-Processing Workload Offloading Framework by Providing Multiple Optimization Opportunities (web, paper)

Seokwon Kang, Jongbin Kim, Gyeongyong Lee, Jeongmyung Lee, Jiwon Seo, Hyungsoo Jung, Yong Ho Song, Yongjun Park

ACM Transactions on Architecture and Code Optimization (Volume 21, Issue 1), March, 2024

2023

Conference/Symposium/Workshop Proceedings

Tailoring Tiling-based GEMM Performance using Supervised Learning (web, paper)

Yongseung Yu, Donghyun Son, Yeonghyun Lee, Sunghyun Park, Giha Ryu, Myeongjin Cho, Jiwon Seo* and Yongjun Park*

(∗Corresponding authors)

The 41st IEEE International Conference on Computer Design (ICCD), Nov.2023 (BK21+ IF 1)

Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework on Multi-DPU PIM Architecture (web, paper)

Donghyeon Kim, Taehoon Kim, Inyong Hwang, Taehyeong Park, Hanjun Kim, Youngsok Kim, Yongjun Park

32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2023 (BK21+ IF 3)

SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication (web, web2, paper)

Myung-Hwan Jang, Yunyong Ko, Hyuck-Moo Gwon, Ikhyeon Jo, Yongjun Park*, and Sang-Wook Kim*

In Proc. of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), Oct. 2023 (BK21+ IF 3)

(∗Corresponding authors)

Synchronization-aware NAS for an Efficient Collaborative Inference on Mobile Platforms (web, paper)

Beom Woo Kang, Junho Wohn*, Seongju Lee, Sunghyun Park, Yung-Kyun Noh, and Yongjun Park

24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2023 (BK21+ IF 2)

(*Conference speaker)

Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation (web, paper)

Seokho Lee, Younghyun Lee, Hyejun Kim, Taehoon Kim and Yongjun Park

Proc. Design Automation and Test in Europe (DATE), April, 2023 (BK21+ IF 2)

Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems (web, paper)

Taehyeong Park, Seokwon Kang, Myung-Hwan Jang, Sang-Wook Kim, Yongjun Park

IEEE International Conference on Data Engineering (ICDE), April, 2023 (BK21+ IF 3)

Journal/Magazine Articles

2022

Conference/Symposium/Workshop Proceedings

Networked SSD: The Flash Memory Interconnection Network for High-Bandwidth SSD

Jiho Kim, Seokwon Kang, Yongjun Park, John Kim

Proc. 2022 Intl. Symposium on Microarchitecture (MICRO), Oct. 2022 (BK21+ IF 4)

SRTuner: Effective Compiler Optimization Customization By Exposing Synergistic Relations

Sunghyun Park, Seyyed Salar Latifi Oskouei, Yongjun Park, Armand Behroozi, Byungsoo Jeon, Scott Mahlke

Proc. 2022 Intl. Symposium on Code Generation and Optimization (CGO), April . 2022 (BK21+ IF 2)

Journal/Magazine Articles

Dynamic Rate Neural Acceleration Using Multiprocessing Mode Support (web, paper)

Inho Lee, Yangki Lee, Hongjun Um, Seongmin Hong, Yongjun Park

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, June. 2022

MaPHeA: A Framework for Lightweight Memory Hierarchy-Aware Profile-Guided Heap Allocation

Deok-Jae Oh, Yaebin Moon, Do Kyu Ham, Tae Jun Ham, Yongjun Park, Jae W. Lee, Jung Ho Ahn, Eojin Lee

ACM Transactions on Embedded Computing Systems, Mar. 2022

2021

Conference/Symposium/Workshop Proceedings

MASCOT: A Quantization Framework for Efficient Matrix Factorization in Recommender Systems

Yunyong Ko, Jae-Seo Yu, Hong-Kyun Bae, Yongjun Park, Dongwon Lee, and Sang-Wook Kim

21st IEEE International Conference on Data Mining (ICDM ), New Zealand, Dec.2021 (BK21+ IF 3)

Legion: Tailoring Grouped Neural Execution Considering Heterogeneity on Multiple Edge Devices (web, paper)

Kyunghwan Choi*, Seongju Lee*, Beom Woo Kang, and Yongjun Park

The 39th IEEE International Conference on Computer Design (ICCD), Oct.2021 (BK21+ IF 1)

(*Equal contribution)

MaPHeA: A Lightweight Memory Hierarchy-Aware Profile-Guided Heap Allocation Framework

Deok-Jae Oh, Yaebin Moon, Eojin Lee, Tae Jun Ham, Yongjun Park, Jae W. Lee, Jung Ho Ahn

22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2021 (BK21+ IF 2)

Journal/Magazine Articles

A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors (web, paper)

Youngbin Son , Seokwon Kang , Hongjun Um, Seokho Lee, Jonghyun Ham, Donghyeon Kim, and Yongjun Park

MDPI Electronics, Nov. 2021

2020

Conference/Symposium/Workshop Proceedings

LOCKED-Free Journaling: Improving the Coalescing Degree in EXT4 Journaling

Kyoungho Koo, Yongjun Park, Youjip Won

IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA), Aug. 2020

Convergence-Aware Neural Network Training (web, paper)

Hyungjun Oh, Yongseung Yu, Giha Ryu, Gunjoo Ahn, Yuri Jeong, Yongjun Park*, and Jiwon Seo*

56th Design Automation Conference (DAC), July. 2020 (BK21+ IF 3)

(∗Corresponding authors)

Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance (web, paper)

Jiho Kim, John Kim, Yongjun Park

Proc. 56th Design Automation Conference (DAC), July. 2020 (BK21+ IF 3)

Optimization of a GPU-based Sparse Matrix Multiplication for Large Sparse Networks (web, paper)

Jeongmyung Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park

IEEE International Conference on Data Engineering (ICDE), April, 2020 (BK21+ IF 3)

Two-Tier Garbage Collection for Persistent Object

Dokeun Lee, Youjip Won, Yongjun Park, Seongjin Lee

ACM Symposium on Applied Computing (SAC), March, 2020 (BK21+ IF 1)

PreScaler: An Efficient System-aware Precision Scaling Framework on Heterogeneous Systems (web, paper) (Artifacts Evaluated)

Seokwon Kang, Kyunghwan Choi, Yongjun Park

Proc. 2020 Intl. Symposium on Code Generation and Optimization (CGO), Feb. 2020 (BK21+ IF 2)

Journal/Magazine Articles

Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems (web, paper)

Donghyeon Kim, Seokwon Kang, Junsu Lim, Sunwook Jung, Woosung Kim, Yongjun Park

MDPI Electronics, Nov. 2020

2019

Conference/Symposium/Workshop Proceedings

GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures (web, paper)

Seokwon Kang, Yongseung Yu, Jiho Kim, Yongjun Park

Proc. 56th Design Automation Conference (DAC), June. 2019 (BK21+ IF 3)

WIP: A Compiler-based Approach for GPGPU Performance Calibration using TLP Modulation (Work-In-Progress) (web, paper)

Yongseung Yu, Seokwon Kang, Yongjun Park

Proc. ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June.2019 (BK21+ IF 2)

Journal/Magazine Articles

Microarchitecture-Aware Code Generation for Deep Learning on Single-ISA Heterogeneous Multi-Core Mobile Processors

Junmo Park, Yongin Kwon, Yongjun Park, Dongsuk Jeon

IEEE Access, Volume 7, April, 2019

Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs

Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Murali Annavaram, and Won Woo Ro

IEEE Transactions on Computers, 68(4), April, 2019

Improving GPU Multitasking Efficiency using Dynamic Resource Sharing (web, paper)

Jiho Kim, Jehee Cha, Jason Jong Kyu Park, Dongsuk Jeon, and Yongjun Park

IEEE Computer Architecture Letters, 18(1), 2019 (Date of Publication: 21 December 2018)

2018

Conference/Symposium/Workshop Proceedings

Automatic Code Conversion for Non-Volatile Memory

Jinsoo Yoo, Yongjun Park, Youjip Won, Seongjin Lee

ACM Symposium on Applied Computing (SAC), Pau, France, Apr, 2018 (BK21+ IF 1)

NN Compactor: Minimizing Memory and Logic Resources for Small Neural Networks (web, paper)

Seongmin Hong, Inho Lee, Yongjun Park

Proc. Design Automation and Test in Europe (DATE), Mar, 2018 (BK21+ IF 2)

Journal/Magazine Articles

WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs

Yunho Oh, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, and Won Woo Ro

IEEE Transactions on Computers, 67(9), Sep, 2018

2017

Conference/Symposium/Workshop Proceedings

A FPGA-based Neural Accelerator for Small IoT Devices

Seongmin Hong, Yongjun Park

The 14th International SoC Design Conference, Nov, 2017

FPGA Implementation of an Efficient Real-Time Digit Recognition System using Neural Networks

Seongmin Hong, Yongjun Park

SoC 학술대회, May, 2017

A FPGA-based Neural Network Accelerator for Handwritten Digit Recognition with On-chip Memory

Seongmin Hong, Inho Lee, Jehee Cha, Yongseung Yu, Yongjun Park

The 12th IEMEK Symposium on Embedded Technology, May, 2017

Journal/Magazine Articles

Enabling Energy Efficient Image Encryption using Approximate Memoization

Seongmin Hong, Jaehyung Im, SM Mazharul Islam, Jae-Hee You, and Yongjun Park

Journal of Semiconductor Technology & Science (JSTS), 17(3), June, 2017

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems

Yuhwan Ro, Minchul Sung, Yongjun Park, Jung Ho Ahn

IEICE Electronics Express (ELEX), 14(11), June, 2017

A Comparative Study of Programming Environments Exploiting Heterogeneous Systems for Big Data Processing

Bongsuk Ko, Seunghun Han, Yongjun Park, Moongu Jeon, Byeongcheol Lee

IEEE Access, Volume 5, May, 2017

Efficient GPU Multitasking with Latency Minimization and Cache Boosting

Jiho Kim, Minsung Chu, and Yongjun Park

IEICE Electronics Express (ELEX), 14(7), April, 2017

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

ACM SIGPLAN Notices Volume 52, Issue 4, April 2017

Also published in Proc. 22th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April.2017(BK21+ IF 4)

Before 2017

Conference/Symposium/Workshop Proceedings

Enhancing Energy-Efficiency using a Stream Buffer on a Memoization-based Image Encryption Module

Jaehyung Im, Seongmin Hong, Younjun Park

대한임베디드공학회 추계학술대회, Nov, 2016

A Bypass First Policy for Energy-Efficient Last Level Caches

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Proc. International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), July. 2016.

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs

Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Won Woo Ro, and Murali Annavaram

The 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA), June.2016 (BK21+ IF 4)

Design of Energy Efficient Image Encryption Module using Hardware Memoization

Seongmin Hong, Jaehyung Im, Jaehee You, Yongjun Park

The 11th IEMEK Symposium on Embedded Technology, May, 2016.

ELF: Maximizing Memory-level Parallelism for GPUs with Coordinated Warp and Fetch Scheduling

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Proc. International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2015. (BK21+ IF 3)

Fine Grain Cache Partitioning using Per-Instruction Working Blocks

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Proc. 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2015. (BK21+ IF 3)

Efficient Execution of Augmented Reality Applications on Mobile Programmable Accelerators

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Proc. Intl. Conference on Field Programmable Technology (FPT), Dec. 2013

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on HeterogeneousSystems

Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke

Proc. 22nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT), Sep. 2013. (BK21+ IF 3)

Efficient Performance Scaling of Future CGRAs for Mobile Applications

Yongjun Park, Jason Jong Kyu Park, and Scott Mahlke

Proc. Intl. Conference on Field Programmable Technology (FPT), Dec. 2012

Libra: Tailoring SIMD Execution using Heterogeneous Hardware and Dynamic Configurability

Yongjun Park, Jason Jong Kyu Park, Hyunchul Park, and Scott Mahlke

Proc. 45nd Intl. Symposium on Microarchitecture (MICRO), Dec. 2012 (BK21+ IF 4)

Process Variation in Near-Threshold Wide SIMD Architecture

Sangwon Seo, Ronald Dreslinski, Mark Woh, Yongjun Park, Scott Mahlke, David Blaauw, Chaitali Chakrabarti, and Trevor Mudge

Proc. 49th Design Automation Conference (DAC), June. 2012 (BK21+ IF 3)

Resource Recycling: Putting Idle Resources to Work on a Composable Accelerator

Yongjun Park, Hyunchul Park, Scott Mahlke, and Sukjin Kim

Proc. 2010 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2010 (BK21+ IF 2)

Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution for Mobile Multimedia Applications

Hyunchul Park, Yongjun Park, and Scott Mahlke

Proc. 42nd Intl. Symposium on Microarchitecture (MICRO), Dec. 2009, pp. 370-380. (BK21+ IF 4)

CGRA Express: Accelerating Execution using Dynamic Operation Fusion

Yongjun Park, Hyunchul Park, and Scott Mahlke

Proc. 2009 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2009, pp. 271-280. (BK21+ IF 2)

A Dataflow-centric Approach to Design Low Power Control Paths in CGRAs

Hyunchul Park, Yongjun Park, and Scott Mahlke

Proc. 7th IEEE Symposium on Application Specific Processors (SASP), Jul. 2009, pp. 15-20.

Reducing Control Power in CGRAs with Token Flow

Hyunchul Park, Yongjun Park, and Scott Mahlke

7th Workshop on Optimizations for DSP and Embedded Systems (ODES-7), March, 2009

Journal/Magazine Articles

An eDRAM-Based Approximate Register File for GPUs

Donghwan Jeong, Young H. Oh, Yongjun Park, and Jae W. Lee

IEEE Design & Test, 33(1), February 2016.

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke

ACM Transactions on Computer Systems (TOCS), Aug. 2015

Enabling Efficient Alias Speculation

Soumyadeep Ghosh, Yongjun Park, Arun Raman

ACM SIGPLAN Notices Volume 50, Issue 5, May 2015

Also published in Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June.2015 (BK21+ IF 2)

Chimera: Collaborative Preemption for Multitasking on a Shared GPU

Jason Jong Kyu Park, Yongjun Park*, and Scott Mahlke*

ACM SIGPLAN Notices Volume 50, Issue 4, April 2015

(∗Corresponding authors)

Also published in Proc. 20th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March.2015 (BK21+ IF 4)

SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures

Yongjun Park, Sangwon Seo, Hyunchul Park, Hyoun Kyu Cho, and Scott Mahlke

ACM SIGPLAN Notices Volume 47, Issue 4, April 2012

Also published in Proc. 17th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March.2012 (BK21+ IF 4)

Google Sites

Report abuse