2025
Conference/Symposium/Workshop Proceedings
PIM-CCA: An Efficient PIM Architecture with Optimized Integration of Configurable Functional Units (web, paper)
Jeehyun Kim, Donghyeon Kim, Seokwon Kang, Bongjoon Hyun, Inho Lee, Yongjun Park
Proc. 2025 Intl. Symposium on Microarchitecture (MICRO), Oct. 2025 (BK21+ IF 4) (to appear)
SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems (web(tentative), paper)
Seok Namkoong, Taehyeong Park, Kiung Jung, Jinyoung Kim and Yongjun Park
39th ACM International Conference on Supercomputing (ICS), June 2025 (BK21+ IF 2)
PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM (web(tentative), paper)
Inyong Hwang*, Donghyeon Kim*, Seokwon Kang, Taehyeong Park, Taehoon Kim, Jiwon Seo, Hanjun Kim, Youngsok Kim, and Yongjun Park
39th ACM International Conference on Supercomputing (ICS), June 2025 (BK21+ IF 2)
(*Equal contribution)
Supporting Register-based Addressing Modes for in-DRAM PIM ISAs
Seok Young Kim, Byung Ho Choi, Seokwon Kang, Yongjun Park and Seon Wook Kim
Proc. 62th Design Automation Conference (DAC), June. 2025 (BK21+ IF 3)
SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs
Suhyun Lee, Chaemin Lim, Jinwoo Choi, Heelim Choi, Chan Lee, Yongjun Park, Kwanghyun Park, Hanjun Kim, and Youngsok Kim
2025 ACM International Conference on Management of Data (SIGMOD), June. 2025 (BK21+ IF 4)
CUrator: An Efficient LLM Execution Engine with Optimized Integration of CUDA Libraries (web, paper) (Artifacts Evaluated)
Yoon Noh Lee, Yongseung Yu, and Yongjun Park
Proc. 2025 Intl. Symposium on Code Generation and Optimization (CGO), March . 2025 (BK21+ IF 2)
Accelerating LLMs using an Efficient GEMM Library and Target-Aware Optimizations on Real-World PIM Devices (web, paper)
Hyeoncheol Kim*, Taehoon Kim*, Taehyeong Park, Donghyeon Kim, Yongseung Yu, Hanjun Kim, and Yongjun Park
Proc. 2025 Intl. Symposium on Code Generation and Optimization (CGO), March . 2025 (BK21+ IF 2)
(*Equal contribution)
Journal/Magazine Articles
Efficient Image Super-Resolution Using Dynamic Quality Control with Recursive Model Structures (web, paper)
Inho Lee, Jaemin Park, Seunghwan Lee, Tae Hyun Kim, Jiwon Seo, Hunjun Lee, Yongjun Park
IEEE Access, Volume: 13, June, 2025
2024
Conference/Symposium/Workshop Proceedings
Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU (web, paper)
Kiung Jung, Seok Namkoong, Hongjun Um, Hyejun Kim, Youngsok Kim, and Yongjun Park
25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2024 (BK21+ IF 2)
Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-core NPUs (web, paper)
Younghyun Lee, Hyejun Kim, Yongseung Yu, Myeongjin Cho, Jiwon Seo and Yongjun Park
Proc. Design Automation and Test in Europe (DATE), March, 2024 (BK21+ IF 2)
Journal/Magazine Articles
ISP Agent: A Generalized In-Storage-Processing Workload Offloading Framework by Providing Multiple Optimization Opportunities (web, paper)
Seokwon Kang, Jongbin Kim, Gyeongyong Lee, Jeongmyung Lee, Jiwon Seo, Hyungsoo Jung, Yong Ho Song, Yongjun Park
ACM Transactions on Architecture and Code Optimization (Volume 21, Issue 1), March, 2024
2023
Conference/Symposium/Workshop Proceedings
Tailoring Tiling-based GEMM Performance using Supervised Learning (web, paper)
Yongseung Yu, Donghyun Son, Yeonghyun Lee, Sunghyun Park, Giha Ryu, Myeongjin Cho, Jiwon Seo* and Yongjun Park*
(∗Corresponding authors)
The 41st IEEE International Conference on Computer Design (ICCD), Nov.2023 (BK21+ IF 1)
Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework on Multi-DPU PIM Architecture (web, paper)
Donghyeon Kim, Taehoon Kim, Inyong Hwang, Taehyeong Park, Hanjun Kim, Youngsok Kim, Yongjun Park
32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2023 (BK21+ IF 3)
SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication (web, web2, paper)
Myung-Hwan Jang, Yunyong Ko, Hyuck-Moo Gwon, Ikhyeon Jo, Yongjun Park*, and Sang-Wook Kim*
In Proc. of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), Oct. 2023 (BK21+ IF 3)
(∗Corresponding authors)
Synchronization-aware NAS for an Efficient Collaborative Inference on Mobile Platforms (web, paper)
Beom Woo Kang, Junho Wohn*, Seongju Lee, Sunghyun Park, Yung-Kyun Noh, and Yongjun Park
24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2023 (BK21+ IF 2)
(*Conference speaker)
Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation (web, paper)
Seokho Lee, Younghyun Lee, Hyejun Kim, Taehoon Kim and Yongjun Park
Proc. Design Automation and Test in Europe (DATE), April, 2023 (BK21+ IF 2)
Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems (web, paper)
Taehyeong Park, Seokwon Kang, Myung-Hwan Jang, Sang-Wook Kim, Yongjun Park
IEEE International Conference on Data Engineering (ICDE), April, 2023 (BK21+ IF 3)
Journal/Magazine Articles
2022
Conference/Symposium/Workshop Proceedings
Networked SSD: The Flash Memory Interconnection Network for High-Bandwidth SSD
Jiho Kim, Seokwon Kang, Yongjun Park, John Kim
Proc. 2022 Intl. Symposium on Microarchitecture (MICRO), Oct. 2022 (BK21+ IF 4)
SRTuner: Effective Compiler Optimization Customization By Exposing Synergistic Relations
Sunghyun Park, Seyyed Salar Latifi Oskouei, Yongjun Park, Armand Behroozi, Byungsoo Jeon, Scott Mahlke
Proc. 2022 Intl. Symposium on Code Generation and Optimization (CGO), April . 2022 (BK21+ IF 2)
Journal/Magazine Articles
Dynamic Rate Neural Acceleration Using Multiprocessing Mode Support (web, paper)
Inho Lee, Yangki Lee, Hongjun Um, Seongmin Hong, Yongjun Park
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, June. 2022
MaPHeA: A Framework for Lightweight Memory Hierarchy-Aware Profile-Guided Heap Allocation
Deok-Jae Oh, Yaebin Moon, Do Kyu Ham, Tae Jun Ham, Yongjun Park, Jae W. Lee, Jung Ho Ahn, Eojin Lee
ACM Transactions on Embedded Computing Systems, Mar. 2022
2021
Conference/Symposium/Workshop Proceedings
MASCOT: A Quantization Framework for Efficient Matrix Factorization in Recommender Systems
Yunyong Ko, Jae-Seo Yu, Hong-Kyun Bae, Yongjun Park, Dongwon Lee, and Sang-Wook Kim
21st IEEE International Conference on Data Mining (ICDM ), New Zealand, Dec.2021 (BK21+ IF 3)
Legion: Tailoring Grouped Neural Execution Considering Heterogeneity on Multiple Edge Devices (web, paper)
Kyunghwan Choi*, Seongju Lee*, Beom Woo Kang, and Yongjun Park
The 39th IEEE International Conference on Computer Design (ICCD), Oct.2021 (BK21+ IF 1)
(*Equal contribution)
MaPHeA: A Lightweight Memory Hierarchy-Aware Profile-Guided Heap Allocation Framework
Deok-Jae Oh, Yaebin Moon, Eojin Lee, Tae Jun Ham, Yongjun Park, Jae W. Lee, Jung Ho Ahn
22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(LCTES), June.2021 (BK21+ IF 2)
Journal/Magazine Articles
A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors (web, paper)
Youngbin Son , Seokwon Kang , Hongjun Um, Seokho Lee, Jonghyun Ham, Donghyeon Kim, and Yongjun Park
MDPI Electronics, Nov. 2021
2020
Conference/Symposium/Workshop Proceedings
LOCKED-Free Journaling: Improving the Coalescing Degree in EXT4 Journaling
Kyoungho Koo, Yongjun Park, Youjip Won
IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA), Aug. 2020
Convergence-Aware Neural Network Training (web, paper)
Hyungjun Oh, Yongseung Yu, Giha Ryu, Gunjoo Ahn, Yuri Jeong, Yongjun Park*, and Jiwon Seo*
56th Design Automation Conference (DAC), July. 2020 (BK21+ IF 3)
(∗Corresponding authors)
Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance (web, paper)
Jiho Kim, John Kim, Yongjun Park
Proc. 56th Design Automation Conference (DAC), July. 2020 (BK21+ IF 3)
Optimization of a GPU-based Sparse Matrix Multiplication for Large Sparse Networks (web, paper)
Jeongmyung Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park
IEEE International Conference on Data Engineering (ICDE), April, 2020 (BK21+ IF 3)
Two-Tier Garbage Collection for Persistent Object
Dokeun Lee, Youjip Won, Yongjun Park, Seongjin Lee
ACM Symposium on Applied Computing (SAC), March, 2020 (BK21+ IF 1)
PreScaler: An Efficient System-aware Precision Scaling Framework on Heterogeneous Systems (web, paper) (Artifacts Evaluated)
Seokwon Kang, Kyunghwan Choi, Yongjun Park
Proc. 2020 Intl. Symposium on Code Generation and Optimization (CGO), Feb. 2020 (BK21+ IF 2)
Journal/Magazine Articles
Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems (web, paper)
Donghyeon Kim, Seokwon Kang, Junsu Lim, Sunwook Jung, Woosung Kim, Yongjun Park
MDPI Electronics, Nov. 2020
2019
Conference/Symposium/Workshop Proceedings
GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures (web, paper)
Seokwon Kang, Yongseung Yu, Jiho Kim, Yongjun Park
Proc. 56th Design Automation Conference (DAC), June. 2019 (BK21+ IF 3)
WIP: A Compiler-based Approach for GPGPU Performance Calibration using TLP Modulation (Work-In-Progress) (web, paper)
Yongseung Yu, Seokwon Kang, Yongjun Park
Proc. ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June.2019 (BK21+ IF 2)
Journal/Magazine Articles
Microarchitecture-Aware Code Generation for Deep Learning on Single-ISA Heterogeneous Multi-Core Mobile Processors
Junmo Park, Yongin Kwon, Yongjun Park, Dongsuk Jeon
IEEE Access, Volume 7, April, 2019
Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs
Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Murali Annavaram, and Won Woo Ro
IEEE Transactions on Computers, 68(4), April, 2019
Improving GPU Multitasking Efficiency using Dynamic Resource Sharing (web, paper)
Jiho Kim, Jehee Cha, Jason Jong Kyu Park, Dongsuk Jeon, and Yongjun Park
IEEE Computer Architecture Letters, 18(1), 2019 (Date of Publication: 21 December 2018)
2018
Conference/Symposium/Workshop Proceedings
Automatic Code Conversion for Non-Volatile Memory
Jinsoo Yoo, Yongjun Park, Youjip Won, Seongjin Lee
ACM Symposium on Applied Computing (SAC), Pau, France, Apr, 2018 (BK21+ IF 1)
NN Compactor: Minimizing Memory and Logic Resources for Small Neural Networks (web, paper)
Seongmin Hong, Inho Lee, Yongjun Park
Proc. Design Automation and Test in Europe (DATE), Mar, 2018 (BK21+ IF 2)
Journal/Magazine Articles
WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs
Yunho Oh, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, and Won Woo Ro
IEEE Transactions on Computers, 67(9), Sep, 2018
2017
Conference/Symposium/Workshop Proceedings
A FPGA-based Neural Accelerator for Small IoT Devices
Seongmin Hong, Yongjun Park
The 14th International SoC Design Conference, Nov, 2017
FPGA Implementation of an Efficient Real-Time Digit Recognition System using Neural Networks
Seongmin Hong, Yongjun Park
SoC 학술대회, May, 2017
A FPGA-based Neural Network Accelerator for Handwritten Digit Recognition with On-chip Memory
Seongmin Hong, Inho Lee, Jehee Cha, Yongseung Yu, Yongjun Park
The 12th IEMEK Symposium on Embedded Technology, May, 2017
Journal/Magazine Articles
Enabling Energy Efficient Image Encryption using Approximate Memoization
Seongmin Hong, Jaehyung Im, SM Mazharul Islam, Jae-Hee You, and Yongjun Park
Journal of Semiconductor Technology & Science (JSTS), 17(3), June, 2017
Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
Yuhwan Ro, Minchul Sung, Yongjun Park, Jung Ho Ahn
IEICE Electronics Express (ELEX), 14(11), June, 2017
A Comparative Study of Programming Environments Exploiting Heterogeneous Systems for Big Data Processing
Bongsuk Ko, Seunghun Han, Yongjun Park, Moongu Jeon, Byeongcheol Lee
IEEE Access, Volume 5, May, 2017
Efficient GPU Multitasking with Latency Minimization and Cache Boosting
Jiho Kim, Minsung Chu, and Yongjun Park
IEICE Electronics Express (ELEX), 14(7), April, 2017
Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
ACM SIGPLAN Notices Volume 52, Issue 4, April 2017
Also published in Proc. 22th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April.2017(BK21+ IF 4)
Before 2017
Conference/Symposium/Workshop Proceedings
Enhancing Energy-Efficiency using a Stream Buffer on a Memoization-based Image Encryption Module
Jaehyung Im, Seongmin Hong, Younjun Park
대한임베디드공학회 추계학술대회, Nov, 2016
A Bypass First Policy for Energy-Efficient Last Level Caches
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), July. 2016.
APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs
Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Won Woo Ro, and Murali Annavaram
The 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA), June.2016 (BK21+ IF 4)
Design of Energy Efficient Image Encryption Module using Hardware Memoization
Seongmin Hong, Jaehyung Im, Jaehee You, Yongjun Park
The 11th IEMEK Symposium on Embedded Technology, May, 2016.
ELF: Maximizing Memory-level Parallelism for GPUs with Coordinated Warp and Fetch Scheduling
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2015. (BK21+ IF 3)
Fine Grain Cache Partitioning using Per-Instruction Working Blocks
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2015. (BK21+ IF 3)
Efficient Execution of Augmented Reality Applications on Mobile Programmable Accelerators
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Proc. Intl. Conference on Field Programmable Technology (FPT), Dec. 2013
Transparent CPU-GPU Collaboration for Data-Parallel Kernels on HeterogeneousSystems
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke
Proc. 22nd Intl. Conference on Parallel Architectures and Compilation Techniques (PACT), Sep. 2013. (BK21+ IF 3)
Efficient Performance Scaling of Future CGRAs for Mobile Applications
Yongjun Park, Jason Jong Kyu Park, and Scott Mahlke
Proc. Intl. Conference on Field Programmable Technology (FPT), Dec. 2012
Libra: Tailoring SIMD Execution using Heterogeneous Hardware and Dynamic Configurability
Yongjun Park, Jason Jong Kyu Park, Hyunchul Park, and Scott Mahlke
Proc. 45nd Intl. Symposium on Microarchitecture (MICRO), Dec. 2012 (BK21+ IF 4)
Process Variation in Near-Threshold Wide SIMD Architecture
Sangwon Seo, Ronald Dreslinski, Mark Woh, Yongjun Park, Scott Mahlke, David Blaauw, Chaitali Chakrabarti, and Trevor Mudge
Proc. 49th Design Automation Conference (DAC), June. 2012 (BK21+ IF 3)
Resource Recycling: Putting Idle Resources to Work on a Composable Accelerator
Yongjun Park, Hyunchul Park, Scott Mahlke, and Sukjin Kim
Proc. 2010 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2010 (BK21+ IF 2)
Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution for Mobile Multimedia Applications
Hyunchul Park, Yongjun Park, and Scott Mahlke
Proc. 42nd Intl. Symposium on Microarchitecture (MICRO), Dec. 2009, pp. 370-380. (BK21+ IF 4)
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Yongjun Park, Hyunchul Park, and Scott Mahlke
Proc. 2009 Intl. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2009, pp. 271-280. (BK21+ IF 2)
A Dataflow-centric Approach to Design Low Power Control Paths in CGRAs
Hyunchul Park, Yongjun Park, and Scott Mahlke
Proc. 7th IEEE Symposium on Application Specific Processors (SASP), Jul. 2009, pp. 15-20.
Reducing Control Power in CGRAs with Token Flow
Hyunchul Park, Yongjun Park, and Scott Mahlke
7th Workshop on Optimizations for DSP and Embedded Systems (ODES-7), March, 2009
Journal/Magazine Articles
An eDRAM-Based Approximate Register File for GPUs
Donghwan Jeong, Young H. Oh, Yongjun Park, and Jae W. Lee
IEEE Design & Test, 33(1), February 2016.
SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke
ACM Transactions on Computer Systems (TOCS), Aug. 2015
Enabling Efficient Alias Speculation
Soumyadeep Ghosh, Yongjun Park, Arun Raman
ACM SIGPLAN Notices Volume 50, Issue 5, May 2015
Also published in Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June.2015 (BK21+ IF 2)
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
Jason Jong Kyu Park, Yongjun Park*, and Scott Mahlke*
ACM SIGPLAN Notices Volume 50, Issue 4, April 2015
(∗Corresponding authors)
Also published in Proc. 20th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March.2015 (BK21+ IF 4)
SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures
Yongjun Park, Sangwon Seo, Hyunchul Park, Hyoun Kyu Cho, and Scott Mahlke
ACM SIGPLAN Notices Volume 47, Issue 4, April 2012
Also published in Proc. 17th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March.2012 (BK21+ IF 4)