Juhyoung Lee

I am working as a GPU design engineer at Qualcomm. I pursued a Ph.D. degree in the Department of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST), advised by Prof. Hoi-Jun Yoo. I was a research intern at Meta (Sunnyvale, CA). I am a recipient of the Best Research Achievement Award at KAIST. My research interest includes energy-efficient multicore architectures/accelerator ASICs/systems.

During Ph.D., I have participated in 5 silicon chip designs with Samsung 65nm, 28nm, and 28nm FD-SOI technology. My research experiences spanned end-to-end silicon design processes from the low-level chip design (Verilog + Synopsys tools) to the high-level system design (C++, Python + Altera FPGA tools). My dissertation research includes several hardware-software co-design methods for memory power/bandwidth optimization in the mobile DNN accelerator.

Resume | Google Scholar | Linkedin

Education

Korea Advanced Institute of Science and Technology (KAIST)

Ph.D. in Electrical Engineering (Advisor: Hoi-Jun Yoo)

Korea Advanced Institute of Science and Technology (KAIST)

M.E. in Electrical Engineering (Advisor: Hoi-Jun Yoo)

Korea Advanced Institute of Science and Technology (KAIST)

B.E. in Electrical Engineering with Summa Cum Laude (GPA 4.05/4.3)

Mar. 2019 ~ Feb. 2023

Mar. 2017 ~ Feb. 2019

Mar. 2013 ~ Feb. 2017

Research Experience & Activities

Meta, Ph.D. Research Scientist Intern

LLM workload analysis for 3D stacked memory use case study for AR/VR device

Acknowledged in "Enabling On-Device Large Language Models with 3D-Stacked Memory", NIPSW 2025

(Manager: Huichu Liu)

Reported on national broadcast and government news (South Korea)

DRL accelerator project for mobile devices

Invited Talk: Electronic & Information Research Information Center (EIRIC)

A Deep Reinforcement Learning Accelerator Design with Dual-mode Weight Compression and Floating-point Computing-in-Memory Architecture

May. 2022 ~ Sep. 2022

Jul. 2021

Nov. 2021

Honors & AWARDs

Scholarships and Awards

Best Research Achievement Award (Kim Choong-Ki Award)
Korea Foundation for Advanced Studies, Undergraduate Student Scholarship Program
Young-Han Kim Global Leader Scholarship
Dean’s List In Recognition of Outstanding Scholastic Achievement

Apr. 2022

Mar. 2013 ~ Feb. 2017

Mar. 2015 ~ Aug. 2016

Mar. 2014

REsearch Demonstrations

Mobile DRL Accelerator (VLSI 2021)

Mobile Super-Resolution Accelerator (VLSI 2019)

SKILLs

Hardware Architect

Deep Neural Network (Deep RL, CNN) Accelerator

Programming Languages

Verilog HDL, C/C++, MATLB, Python

Deep Learning Framework

Pytorch, Tensorflow, MatconvNet

EDA Tools

Synopsis Design Compiler, IC Compiler, IC Compiler II, Cadence, Intel FPGA Design Tool Chain

Technology Experience

Samsung 65nm CMOS, Samsung 28nm Bulk, Samsung 28nm FD-SOI

13 Publications (As primary author)

2022

OmniDRL: An Energy-Efficient Deep Reinforcement Learning Processor With Dual-Mode Weight Compression and Sparse Weight Transposer

Juhyoung Lee, Sangyeob Kim, Sangjin Kim, Wooyoung Jo, Jihoon Kim, Donghyeon Han, and Hoi-Jun Yoo

IEEE Journal of Solid-State Circuits (JSSC), 2022

ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor

Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim,and Hoi-Jun Yoo

IEEE Micro, 2022

Low-power Autonomous Adaptation System with Deep Reinforcement Learning

Juhyoung Lee, Wooyoung Jo, Seong-Wook Park, and Hoi-Jun Yoo

IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022

2021

OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dual-mode Weight Compression and On-chip Sparse Weight Transposer "Selected as Technical Highlighted Paper (Circuit Highlights)"

Juhyoung Lee, Sangyeob Kim, Sangjin Kim, Wooyoung Jo, Donghyeon Han, Jinsu Lee, and Hoi-Jun Yoo

IEEE Symposium on VLSI Circuits (VLSI), 2021

A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory "Selected as 20 most popular on-demand videos"

Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, and Hoi-Jun Yoo

IEEE Symposium on VLSI Circuits (VLSI), 2021

OmniDRL: An Energy-Efficient Mobile Deep Reinforcement Learning Accelerators with Dual-mode Weight Compression and Direct Processing of Compressed Data

Juhyoung Lee, Sangyeob Kim, Jihoon Kim, Sangjin Kim, Wooyoung Jo, Donghyeon Han, and Hoi-Jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2021

An Energy-efficient Floating-Point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory

Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Donghyeon Han, Jinsu Lee, and Hoi-Jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2021

Energy-Efficient Deep Reinforcement Learning Accelerator Designs for Mobile Autonomous Systems

Juhyoung Lee, Changhyeon Kim, Donghyeon Han, Sangyeob Kim, Sangjin Kim, and Hoi-Jun Yoo

IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021

GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning

Juhyoung Lee, Sangyeob Kim, Jihoon Kim, Sangjin Kim, Wooyoung Jo, and Hoi-Jun Yoo

Arxiv, 2021

2020

SRNPU: An Energy-Efficient CNN-Based Super-Resolution Processor With Tile-Based Selective Super-Resolution in Mobile Devices

Juhyoung Lee, Jinsu Lee, and Hoi-Jun Yoo

IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2020

2019

A Full HD 60 fps CNN Super Resolution Processor with Selective Caching based Layer Fusion for Mobile Devices

Juhyoung Lee, Dongjoo Shin, Jinsu Lee, Jinmook Lee, Sanghoon Kang, and Hoi-Jun Yoo

IEEE Symposium on VLSI Circuits (VLSI), 2019

A 99.4 fps Optical Flow Estimation Processor with Image Tiling for Action Recognition in Mobile Devices

Juhyoung Lee, Sungpill Choi, Jinmook Lee, Sanghoon Kang, and Hoi-Jun Yoo

Journal of Semiconductor Technology and Science (JSTS), 2019

2018

A 46.1 fps Global Matching Optical Flow Estimation Processor for Action Recognition in Mobile Devices

Juhyoung Lee, Changhyeon Kim, Sungpill Choi, Dongjoo Shin, Sanghoon Kang, and Hoi-jun Yoo

IEEE International Symposium on Circuits and Systems (ISCAS), 2018

[Papers]

[HL Paper List]

[Papers]

[Video List]

[Papers]

[Slides]

[Intro Video]

[Papers]

[Slides]

[Intro Video]

[Papers]

28 Publications (As collaborator)

2022

A Low-Power Graph Convolutional Network Processor with Sparse Grouping for 3D Point Cloud Semantic Segmentation in Mobile Devices

Sangjin Kim, Sangyeob Kim, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2022

TSUNAMI: Triple Sparsity-Aware Ultra Energy-Efficient Neural Network Training Accelerator With Multi-Modal Iterative Pruning

Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Donghyeon Han, Wooyoung Jo, and Hoi-Jun Yoo

IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2022

HNPU-V2: A 46.6 FPS DNN Training Processor for Real-World Environmental Adaptation based Robust Object Detection on Mobile Devices

Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2022

SNPU: Always-on 63.2μW Face Recognition Spike Domain Convolutional Neural Network Processor with Spike Train Decomposition and Shift-and-Accumulation Unit

Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Asian Solid-State Circuits Conference (A-SSCC), 2022

A 0.95 mJ/frame DNN Training Processor for Robust Object Detection with Real-World Environmental Adaptation

Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo

IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022

A 161.6 TOPS/W Mixed-Mode Computing-in-Memory Processor for Energy-Efficient Mixed-Precision Deep Neural Networks

Wooyoung Jo, Sangjin Kim, Juhyoung Lee, Soyeon Um, Zhiyong Li, and Hoi-Jun Yoo

IEEE International Symposium on Circuits and Systems (ISCAS), 2022

2021

HNPU: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching

Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Journal of Solid-State Circuits (JSSC), 2021

GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation

Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Junha Ryu, and Hoi-Jun Yoo

IEEE Journal of Solid-State Circuits (JSSC), 2021

Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks

Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Jaehoon Heo, and Joo-Young Kim

IEEE Journal of Solid-State Circuits (JSSC), 2021

A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation

Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Micro, 2021

PNPU: An Energy-Efficient Deep-Neural-Network Learning Processor With Stochastic Coarse–Fine Level Weight Pruning and Adaptive Input/Output/Weight Zero Skipping

Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee, Wooyoung Jo, and Hoi-Jun Yoo

IEEE Solid-State Circuits Letters (SSCL), 2021

PNNPU: A 11.9 TOPS/W High-speed 3D Point Cloud-based Neural Network Processor with Block-based Point Processing for Regular DRAM Access

Sangjin Kim, Juhyoung Lee, Dongseok Im and Hoi-Jun Yoo

IEEE Symposium on VLSI Circuits (VLSI), 2021

PNNPU: A Fast and Efficient 3D Point Cloud-based Neural Network Processor with Block-based Point Processing for Regular DRAM Access

Sangjin Kim, Juhyoung Lee, Dongseok Im, Hoi-jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2021

An Energy-Efficient Deep Reinforcement Learning FPGA Accelerator for Online Fast Adaptation with Selective Mixed-Precision Re-Training

Wooyoung Jo, Juhyoung Lee, Seunghyun Park, Hoi-Jun Yoo

IEEE Asian Solid-State Circuits Conference (A-SSCC), 2021

An Energy-Efficient Deep Neural Network Training Processor with Bit-Slice-Level Reconfigurability and Sparsity Exploitation

Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), 2021

2020

A Power-Efficient CNN Accelerator With Similar Feature Skipping for Face Recognition in Mobile Devices

Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinsu Lee and Hoi-Jun Yoo

IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2020

A 1.02-μW STT-MRAM-Based DNN ECG Arrhythmia Monitoring SoC With Leakage-Based Delay MAC Unit

Kyoung-Rog Lee, Jihoon Kim, Changhyeon Kim, Donghyeon Han, Juhyoung Lee, Jinsu Lee, Hongsik Jeong, and Hoi-Jun Yoo

IEEE Solid-State Circuits Letters (SSCL), 2020

GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation

Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Hoi-Jun Yoo

IEEE International Solid-State Circuits Conference (ISSCC), 2020

A 146.52 TOPS/W Deep-Neural-Network Learning Processor with Stochastic Coarse-Fine Pruning and Adaptive Input/Output/Weight Skipping

Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee and Hoi-Jun Yoo

IEEE Symposium on VLSI Circuits (VLSI), 2020

Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision

Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Hoi-Jun Yoo and Joo-Young Kim

IEEE Symposium on VLSI Circuits (VLSI), 2020

GANPU: A Versatile Many-Core Processor for Training GAN on Mobile Devices with Speculative Dual-Sparsity Exploitation

Sanghoon Kang, Donghyeon Han, Juhyoung Lee, Dongseok Im, Sangyeob Kim, Soyeon Kim, Junha Ryu, and Hoi-Jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2020

A 54.7 fps 3D Point Cloud Semantic Segmentation Processor with Sparse Grouping Based Dilated Graph Convolutional Network for Mobile Devices

Sangjin Kim, Sangyeob Kim, Juhyoung Lee, and Hoi-Jun Yoo

IEEE International Symposium on Circuits and Systems (ISCAS), 2020

2019

An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator with Fine-grained Mixed Precision of FP8-FP16

Jinsu Lee, Juhyoung Lee, Donghyeon Han, Jinmook Lee, Gwantae Park, and Hoi-Jun Yoo

IEEE Solid-State Circuits Letters (SSCL), 2019

LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16

Jinsu Lee, Juhyoung Lee, Donghyeon Han, and Hoi-Jun Yoo

IEEE International Solid-State Circuits Conference (ISSCC), 2019

LNPU: An Energy-Efficient Deep-Neural-Network Training Processor with Fine-Grained Mixed Precision

Jinsu Lee, Juhyoung Lee, Donghyeon Han, and Hoi-Jun Yoo

IEEE HOT Chips: A Symposium on High Performance Chips, 2019

A 15.2 TOPS/W CNN Accelerator with Similar Feature Skipping for Face Recognition in Mobile Devices

Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinsu Lee, and Hoi-Jun Yoo

IEEE International Symposium on Circuits and Systems (ISCAS), 2019

2018

DNPU: An Energy-Efficient Deep Learning Processor with Heterogeneous Multi-Core Architecture

Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Micro, 2018

2017

An Energy-Efficient Deep Learning Processor with Heterogeneous Multi-Core Architecture for Convolutional Neural Networks and Recurrent Neural Networks

Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo

IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), 2017

[Papers]

Academic Services

Journal Reviewer

IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS)
IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I)
IEEE Proceedings of the IEEE
IEEE Open Journal of the Solid-State Circuits Society (OJ-SSCS)
IEEE Solid-State Circuits Letters (SSCL)

Page updated

Google Sites

Report abuse