About Me
I am a Senior Deep Learning Performance Engineer at NVIDIA. My role is to make deep learning workloads run faster on NVIDIA GPUs, focusing on GPU/system architecture and entire software stacks for deep learning. In the past, I earned my Ph.D. in electrical engineering at KAIST under the guidance of Professor Minsoo Rhu. My primary research interests are computer systems/architecture for deep learning and emerging applications.
Work Experince
NVIDIA, Santa Clara, CA, USA [Jun. 2024 ~ Present]
Sr. Deep Learning Performance Engineer
GPU Computer Architecture / Deep Learning Training Performance
Manager: Nitin
Meta, Menlo Park, CA, USA [May. 2022 ~ Aug. 2022]
Research Intern
AI Systems Hardware/Software Co-Design
Mentor: Changkyu Kim and Jaewon Lee
Education
KAIST (Korea Advanced Institute of Science and Technology), Daejeon, South Korea [Sep. 2018 ~ Feb. 2024]
Ph.D. in Electrical Engineering
Thesis: Hardware and Software Systems for Accelerating Large-Scale Deep Learning Recommendation Models
Vertically Integrated Architecture Research Group (VIA)
Adviser: Prof. Minsoo Rhu
POSTECH (Pohang Science and Technology), Pohang, South Korea [Mar. 2013 ~ Aug. 2017]
B.S. in Computer Science and Engineering
Graduation Research: Unsupervised Object Detection
Publications
Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, Edward Suh, and Minsoo Rhu, "LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models," The 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-29), San Diego, CA, April 2024
Acceptance Rate: 20% (193 among 921)
[Paper]
Youngeun Kwon and Minsoo Rhu, "Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards," The 49th IEEE/ACM International Symposium on Computer Architecture (ISCA-49), New York, NY, June 2022
Yunjae Lee, Youngeun Kwon, and Minsoo Rhu, "Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training", IEEE Computer Architecture Letters (CAL), Jul. 2021
[Paper]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu, "Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training," The 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27), Seoul, South Korea, Feb. 2021
Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu, "Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations," The 47th International Symposium on Computer Architecture (ISCA-47), Valencia, Spain, June 2020
Acceptance Rate: 18% (77 among 421)
[Paper]
Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, and Minsoo Rhu, "NeuMMU: Architectural Support for Efficient Address Translations in NPUs," The 25th ACM International Conference on Computer Architectural Support for Programming Languages and Operating Systems (ASPLOS-25), Lausanne, Switzerland, Mar. 2020
Selected for IEEE Micro Top Picks Honorable Mention ("IEEE Micro - Top Picks From the 2020 Computer Architecture")
Acceptance Rate: 18% (86 among 476)
[Paper]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu, "TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning," The 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, Oct. 2019
Selected for IEEE Micro Top Picks Honorable Mention ("IEEE Micro - The 2019 Top Picks in Computer Architecture")
Acceptance Rate: 22% (79 among 344)
[Paper]
Youngeun Kwon and Minsoo Rhu, "A Disaggregated Memory System for Deep Learning," IEEE Micro, Special Issue on Machine Learning Acceleration, Sep/Oct., 2019
[Paper]
Youngeun Kwon and Minsoo Rhu, "Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning", The 51st IEEE/ACM International Symposium on Microarchitecture (MICRO-51), Fukuoka, Japan, Oct. 2018
Acceptance Rate: 21% (74 among 351)
Youngeun Kwon and Minsoo Rhu, "A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks", in IEEE Computer Architecture Letters (CAL), vol. 17, no. 2, pp. 134-138, July-Dec. 2018.
Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Youngeun Kwon, and Stephen W. Keckler, "Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks," The 24th IEEE International Symposium on High-Performance Computer Architecture (HPCA-24), Vienna, Austria, Feb. 2018
Recognition
Honorable Mention in IEEE Micro Top Picks 2020 [Mar. 2021]
The 27th Samsung Humantech Paper Award [Feb. 2021]
Gold Prize (1st place in the Computer Science and Engineering track)
Honorable Mention in IEEE Micro Top Picks 2019 [Mar. 2020]
KAIST Breakthroughs, Fall 2020 Vol. 14 [Fall 2020]
Global Ph.D. Fellowship, National Research Foundation of Korea (NRF) [Mar. 2019 ~ Feb. 2024]
Research Topic: Memory-centric Architecture for Deep Learning Acceleration
Acceptance Rate: 15.2% (216 among 1,423)
Best presentation award (1st place) in the graduation research project at POSTECH CSE department [July 2017]
Research Topic: Unsupervised Object Detection
Scholar of the National Academic Excellence Scholarship for Science and Engineering, Korea Student Aid Foundation (KOSAF) [Mar. 2013 ~ Dec. 2016]
Patents
Inventor: Minsoo Rhu, Youngeun Kwon, and Yunjae Lee
Applications: KR (2019), US (2020), CN (2020)
Registered: KR (2022)
Current Assignee: KAIST
Inventor: Minsoo Rhu, and Youngeun Kwon
Applications: KR (2018), US (2019)
Registered: KR (2022)
Current Assignee: Samsung Electronics
Academic Service
Teaching Experience
KAIST, Daejeon, South Korea
Teaching Assistant
EE595: Special Topics in Electrical and Computer Engineering <Parallel Computer Architecture> [2021-Spring]
EE312: Introduction to Computer Architecture [2021-Fall, 2020-Fall, 2020-Spring]
EE209: Programming Structure for Electrical Engineering [2022-Spring, 2019-Fall]
POSTECH, Pohang, South Korea
Teaching Assistant
CSED311: Computer Architecture [2018-Spring]