I am working as a GPU design engineer at Qualcomm. I pursued a Ph.D. degree in the Department of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST), advised by Prof. Hoi-Jun Yoo. I was a research intern at Meta (Sunnyvale, CA). I am a recipient of the Best Research Achievement Award at KAIST. My research interest includes energy-efficient multicore architectures/accelerator ASICs/systems.
During Ph.D., I have participated in 5 silicon chip designs with Samsung 65nm, 28nm, and 28nm FD-SOI technology. My research experiences spanned end-to-end silicon design processes from the low-level chip design (Verilog + Synopsys tools) to the high-level system design (C++, Python + Altera FPGA tools). My dissertation research includes several hardware-software co-design methods for memory power/bandwidth optimization in the mobile DNN accelerator.
Korea Advanced Institute of Science and Technology (KAIST)
Ph.D. in Electrical Engineering (Advisor: Hoi-Jun Yoo)
Korea Advanced Institute of Science and Technology (KAIST)
M.E. in Electrical Engineering (Advisor: Hoi-Jun Yoo)
Korea Advanced Institute of Science and Technology (KAIST)
B.E. in Electrical Engineering with Summa Cum Laude (GPA 4.05/4.3)
Mar. 2019 ~ Feb. 2023
Mar. 2017 ~ Feb. 2019
Mar. 2013 ~ Feb. 2017
Meta, Ph.D. Research Scientist Intern
LLM workload analysis for 3D stacked memory use case study for AR/VR device
Acknowledged in "Enabling On-Device Large Language Models with 3D-Stacked Memory", NIPSW 2025
(Manager: Huichu Liu)
Reported on national broadcast and government news (South Korea)
DRL accelerator project for mobile devices
Invited Talk: Electronic & Information Research Information Center (EIRIC)
A Deep Reinforcement Learning Accelerator Design with Dual-mode Weight Compression and Floating-point Computing-in-Memory Architecture
May. 2022 ~ Sep. 2022
Jul. 2021
Nov. 2021
Scholarships and Awards
Best Research Achievement Award (Kim Choong-Ki Award)
Korea Foundation for Advanced Studies, Undergraduate Student Scholarship Program
Young-Han Kim Global Leader Scholarship
Dean’s List In Recognition of Outstanding Scholastic Achievement
Apr. 2022
Mar. 2013 ~ Feb. 2017
Mar. 2015 ~ Aug. 2016
Mar. 2014
Mobile DRL Accelerator (VLSI 2021)
Mobile Super-Resolution Accelerator (VLSI 2019)
Hardware Architect
Deep Neural Network (Deep RL, CNN) Accelerator
Programming Languages
Verilog HDL, C/C++, MATLB, Python
Deep Learning Framework
Pytorch, Tensorflow, MatconvNet
EDA Tools
Synopsis Design Compiler, IC Compiler, IC Compiler II, Cadence, Intel FPGA Design Tool Chain
Technology Experience
Samsung 65nm CMOS, Samsung 28nm Bulk, Samsung 28nm FD-SOI
2022
OmniDRL: An Energy-Efficient Deep Reinforcement Learning Processor With Dual-Mode Weight Compression and Sparse Weight Transposer
Juhyoung Lee, Sangyeob Kim, Sangjin Kim, Wooyoung Jo, Jihoon Kim, Donghyeon Han, and Hoi-Jun Yoo
IEEE Journal of Solid-State Circuits (JSSC), 2022
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor
Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim,and Hoi-Jun Yoo
IEEE Micro, 2022
Low-power Autonomous Adaptation System with Deep Reinforcement Learning
Juhyoung Lee, Wooyoung Jo, Seong-Wook Park, and Hoi-Jun Yoo
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022
2021
OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dual-mode Weight Compression and On-chip Sparse Weight Transposer "Selected as Technical Highlighted Paper (Circuit Highlights)"
Juhyoung Lee, Sangyeob Kim, Sangjin Kim, Wooyoung Jo, Donghyeon Han, Jinsu Lee, and Hoi-Jun Yoo
IEEE Symposium on VLSI Circuits (VLSI), 2021
A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory "Selected as 20 most popular on-demand videos"
Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, and Hoi-Jun Yoo
IEEE Symposium on VLSI Circuits (VLSI), 2021
OmniDRL: An Energy-Efficient Mobile Deep Reinforcement Learning Accelerators with Dual-mode Weight Compression and Direct Processing of Compressed Data
Juhyoung Lee, Sangyeob Kim, Jihoon Kim, Sangjin Kim, Wooyoung Jo, Donghyeon Han, and Hoi-Jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2021
An Energy-efficient Floating-Point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory
Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Donghyeon Han, Jinsu Lee, and Hoi-Jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2021
Energy-Efficient Deep Reinforcement Learning Accelerator Designs for Mobile Autonomous Systems
Juhyoung Lee, Changhyeon Kim, Donghyeon Han, Sangyeob Kim, Sangjin Kim, and Hoi-Jun Yoo
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021
GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning
Juhyoung Lee, Sangyeob Kim, Jihoon Kim, Sangjin Kim, Wooyoung Jo, and Hoi-Jun Yoo
Arxiv, 2021
2020
SRNPU: An Energy-Efficient CNN-Based Super-Resolution Processor With Tile-Based Selective Super-Resolution in Mobile Devices
Juhyoung Lee, Jinsu Lee, and Hoi-Jun Yoo
IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2020
2019
A Full HD 60 fps CNN Super Resolution Processor with Selective Caching based Layer Fusion for Mobile Devices
Juhyoung Lee, Dongjoo Shin, Jinsu Lee, Jinmook Lee, Sanghoon Kang, and Hoi-Jun Yoo
IEEE Symposium on VLSI Circuits (VLSI), 2019
A 99.4 fps Optical Flow Estimation Processor with Image Tiling for Action Recognition in Mobile Devices
Juhyoung Lee, Sungpill Choi, Jinmook Lee, Sanghoon Kang, and Hoi-Jun Yoo
Journal of Semiconductor Technology and Science (JSTS), 2019
2018
A 46.1 fps Global Matching Optical Flow Estimation Processor for Action Recognition in Mobile Devices
Juhyoung Lee, Changhyeon Kim, Sungpill Choi, Dongjoo Shin, Sanghoon Kang, and Hoi-jun Yoo
IEEE International Symposium on Circuits and Systems (ISCAS), 2018
2022
A Low-Power Graph Convolutional Network Processor with Sparse Grouping for 3D Point Cloud Semantic Segmentation in Mobile Devices
Sangjin Kim, Sangyeob Kim, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2022
TSUNAMI: Triple Sparsity-Aware Ultra Energy-Efficient Neural Network Training Accelerator With Multi-Modal Iterative Pruning
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Donghyeon Han, Wooyoung Jo, and Hoi-Jun Yoo
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2022
HNPU-V2: A 46.6 FPS DNN Training Processor for Real-World Environmental Adaptation based Robust Object Detection on Mobile Devices
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2022
SNPU: Always-on 63.2μW Face Recognition Spike Domain Convolutional Neural Network Processor with Spike Train Decomposition and Shift-and-Accumulation Unit
Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2022
A 0.95 mJ/frame DNN Training Processor for Robust Object Detection with Real-World Environmental Adaptation
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022
A 161.6 TOPS/W Mixed-Mode Computing-in-Memory Processor for Energy-Efficient Mixed-Precision Deep Neural Networks
Wooyoung Jo, Sangjin Kim, Juhyoung Lee, Soyeon Um, Zhiyong Li, and Hoi-Jun Yoo
IEEE International Symposium on Circuits and Systems (ISCAS), 2022
2021
HNPU: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Journal of Solid-State Circuits (JSSC), 2021
GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation
Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Junha Ryu, and Hoi-Jun Yoo
IEEE Journal of Solid-State Circuits (JSSC), 2021
Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks
Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Jaehoon Heo, and Joo-Young Kim
IEEE Journal of Solid-State Circuits (JSSC), 2021
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Micro, 2021
PNPU: An Energy-Efficient Deep-Neural-Network Learning Processor With Stochastic Coarse–Fine Level Weight Pruning and Adaptive Input/Output/Weight Zero Skipping
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee, Wooyoung Jo, and Hoi-Jun Yoo
IEEE Solid-State Circuits Letters (SSCL), 2021
PNNPU: A 11.9 TOPS/W High-speed 3D Point Cloud-based Neural Network Processor with Block-based Point Processing for Regular DRAM Access
Sangjin Kim, Juhyoung Lee, Dongseok Im and Hoi-Jun Yoo
IEEE Symposium on VLSI Circuits (VLSI), 2021
PNNPU: A Fast and Efficient 3D Point Cloud-based Neural Network Processor with Block-based Point Processing for Regular DRAM Access
Sangjin Kim, Juhyoung Lee, Dongseok Im, Hoi-jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2021
An Energy-Efficient Deep Reinforcement Learning FPGA Accelerator for Online Fast Adaptation with Selective Mixed-Precision Re-Training
Wooyoung Jo, Juhyoung Lee, Seunghyun Park, Hoi-Jun Yoo
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2021
An Energy-Efficient Deep Neural Network Training Processor with Bit-Slice-Level Reconfigurability and Sparsity Exploitation
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), 2021
2020
A Power-Efficient CNN Accelerator With Similar Feature Skipping for Face Recognition in Mobile Devices
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinsu Lee and Hoi-Jun Yoo
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2020
A 1.02-μW STT-MRAM-Based DNN ECG Arrhythmia Monitoring SoC With Leakage-Based Delay MAC Unit
Kyoung-Rog Lee, Jihoon Kim, Changhyeon Kim, Donghyeon Han, Juhyoung Lee, Jinsu Lee, Hongsik Jeong, and Hoi-Jun Yoo
IEEE Solid-State Circuits Letters (SSCL), 2020
GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation
Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Hoi-Jun Yoo
IEEE International Solid-State Circuits Conference (ISSCC), 2020
A 146.52 TOPS/W Deep-Neural-Network Learning Processor with Stochastic Coarse-Fine Pruning and Adaptive Input/Output/Weight Skipping
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee and Hoi-Jun Yoo
IEEE Symposium on VLSI Circuits (VLSI), 2020
Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision
Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Hoi-Jun Yoo and Joo-Young Kim
IEEE Symposium on VLSI Circuits (VLSI), 2020
GANPU: A Versatile Many-Core Processor for Training GAN on Mobile Devices with Speculative Dual-Sparsity Exploitation
Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Junha Ryu, and Hoi-Jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2020
A 54.7 fps 3D Point Cloud Semantic Segmentation Processor with Sparse Grouping Based Dilated Graph Convolutional Network for Mobile Devices
Sangjin Kim, Sangyeob Kim, Juhyoung Lee, and Hoi-Jun Yoo
IEEE International Symposium on Circuits and Systems (ISCAS), 2020
2019
An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator with Fine-grained Mixed Precision of FP8-FP16
Jinsu Lee, Juhyoung Lee, Donghyeon Han, Jinmook Lee, Gwantae Park, and Hoi-Jun Yoo
IEEE Solid-State Circuits Letters (SSCL), 2019
LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16
Jinsu Lee, Juhyoung Lee, Donghyeon Han, and Hoi-Jun Yoo
IEEE International Solid-State Circuits Conference (ISSCC), 2019
LNPU: An Energy-Efficient Deep-Neural-Network Training Processor with Fine-Grained Mixed Precision
Jinsu Lee, Juhyoung Lee, Donghyeon Han, and Hoi-Jun Yoo
IEEE HOT Chips: A Symposium on High Performance Chips, 2019
A 15.2 TOPS/W CNN Accelerator with Similar Feature Skipping for Face Recognition in Mobile Devices
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinsu Lee, and Hoi-Jun Yoo
IEEE International Symposium on Circuits and Systems (ISCAS), 2019
2018
DNPU: An Energy-Efficient Deep Learning Processor with Heterogeneous Multi-Core Architecture
Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Micro, 2018
2017
An Energy-Efficient Deep Learning Processor with Heterogeneous Multi-Core Architecture for Convolutional Neural Networks and Recurrent Neural Networks
Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo
IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), 2017
Journal Reviewer
IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS)
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I)
IEEE Proceedings of the IEEE
IEEE Open Journal of the Solid-State Circuits Society (OJ-SSCS)
IEEE Solid-State Circuits Letters (SSCL)