Taekyung Heo

Senior Deep Learning Engineer @ NVIDIA

TaekyungHeo_CV.pdf

Hands-on computer systems engineer and researcher focused on performance optimization across hardware, system software, and AI workloads. Builds benchmarking frameworks, models complex systems with simulators, and turns research prototypes into reliable production tools. Strong background in computer architecture, system software, and large codebase comprehension. A versatile "Swiss army knife" who tracks recent trends and executes quickly.

Employment

Senior Deep Learning Engineer, NVIDIA, Nov. 2025 - Now
Senior HPC Middleware Developer, NVIDIA, Dec. 2023 - Oct. 2025
Research Engineer II, Georgia Institute of Technology, Mar. 2023 - Dec. 2023
- Supervisor: Tushar Krishna
Research Advisor IV @ Meta, Magnit, Nov. 2022 - Dec. 2023
Postdoctoral Fellow, Georgia Institute of Technology, Mar. 2022 - Feb. 2023
- Supervisor: Tushar Krishna
Visiting Fellow @ Microsoft Research Asia, FA Talent, Feb. 2018 - Aug. 2018

Education

Doctor of Philosophy (Ph.D.), Computer Science, Korea Advanced Institute of Science & Technology (KAIST), Mar. 2016 - Feb. 2022
- Dissertation: Redesigning Hardware and Software Stacks for Terabyte-Scale Memory Systems
- Advisor: Jaehyuk Huh
Master of Science (M.Sc.), Computer Science, Korea Advanced Institute of Science & Technology (KAIST), Mar. 2014 - Feb. 2016
- Thesis: Dynamic Time Slice Management Based on CPU Pooling in Virtualized Systems
- Advisor: Jaehyuk Huh
Bachelor of Science (B.Sc.), Computer Engineering, Sungkyunkwan University, Mar. 2010 - Feb. 2014
- Senior Thesis: Performance Analysis of the ext4 File System in Virtualization Environment
- Advisor: Young Ik Eom

Publications

"MLcommons Chakra: Advancing Performance Benchmarking And Co-Design Using Standardized Execution Traces", Conference on Machine Learning and Systems (MLSys), May 2026
"Supporting Trusted Virtual Machines with Hardware-based Secure Remote Memory", International Symposium on Memory Management (ISMM), June 2024
"ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale", International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2023 [slides, video]
"COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training", arXiv, November 2022
"InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing", International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2021 [slides]
"Adaptive Page Migration Policy with Huge Pages in Tiered Memory Systems", IEEE Transactions on Computers [code]
"Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores", European Conference on Computer Systems (EuroSys), April 2018 [slides, video, code]
“Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations”, International Symposium on Computer Architecture (ISCA), June 2017 [slides]
“Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching”, International Symposium on Computer Architecture (ISCA), June 2016 [slides]

Workshops

"Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces", Workshop on Modeling & Simulation of Systems and Applications (ModSim), August 2023 [slides, poster]
"Exploring Memory Expansion Designs for Training Mixture-of-Experts Models", Workshop on Hot Topics in System Infrastructure (HotInfra), June 2023 [slides]
"Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces", Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench), June 2023 [slides]
"FastSwtich: Enabling Real-time DNN Switching via Weight-Sharing", Workshop on Architecture, Compiler, and System Support for Multi-Model DNN Workloads (ACSMD), June 2022

Presentations & Talks

“Chakra and ASTRA-sim: An Open-source Ecosystem for Advancing Co-design for Future AI Systems”, AI & Systems Co-design Faculty Summit on behalf of Tushar Krishna, October 2023
"Chakra and ASTRA-sim: An Open-source Ecosystem for Advancing Co-design for Future AI Systems", ACE Monthly Meeting, September 2023
"Designing Multi-Tensor Core Systems in SST", SST User Group Meeting, September 2023
"Execution Trace (ET) Execution Through Simulator", Chakra Working Group Meeting, August 2023
"ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale", SRC Combined CADT/AIHW Annual Review, May 2023
"ASTRA-sim Tutorial", International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023
"ASTRA-sim Tutorial", Conference on Machine Learning and Systems (MLSys), August 2022
"ASTRA-sim Tutorial", International Symposium on Computer Architecture (ISCA), June 2022
"ASTRA-sim Tutorial", International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), February 2022
"Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations", Korea Software Congress, December 2017

Research Experiences

Chakra: Advancing Benchmarking and Codesign with Standardized Execution Traces (GitHub)
- Aims at improving pre-silicon codesign and benchmarking for distributed ML through a standardized trace format and reference tools
- Acted as one of main developers of the project, focusing on standardization and tool development
- Member of the Chakra working group in MLCommons
- Presented at MLBench 2023 and ModSim 2023
- Honored with the Dr. Sudhakar Yalamanchili Award at ModSim 2023 (link)

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
- Proposed COMET, a holistic methodology for optimizing cluster design and parallelization strategies in distributed training, enabling rapid design space exploration and evaluation of key cluster resource parameters
- Contributed as the third author to implement a roofline-based computation model in ASTRA-sim
- Work-in-progress

Stealth Research Project
- Implemented computation engines in the Structural Simulation Toolkit (SST)
- Validated the implementation of the computation engines against analytical models
- Found a bug in the SST memHierarchy and contributed to the project by submitting a PR
- Work-in-progress

Exploring Memory Expansion Designs for Training Mixture-of-Experts Models
- Investigated various memory expansion design options to overcome the GPU memory wall challenge, specifically in the context of training Mixture-of-Experts (MoE) models
- Highlighted that remote memory access time and communication time become major performance bottlenecks in MoE model training, but also found that aggressive offloading reduces local HBM memory requirements significantly
- Presented at HotInfra 2023

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
- ASTRA-sim2.0 expands the capabilities of its predecessor, ASTRA-sim1.0, through the addition of support for (1) arbitrary parallelism, (2) hierarchical network modeling, and (3) memory models that were previously unavailable
- Contributed as a co-first author by enabling arbitrary parallelism with a graph-based frontend and adding memory system models
- Published in ISPASS 2023

A Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing
- Identified the memory bloating problem in prior outer-product-based accelerators
- Proposed an inner-product-based sparse-matrix multiplication accelerator that exploits the locality in an inner product
- Contributed as the third author to assist with motivational experiments and writing
- Published in PACT 2021

Hardware-assisted Trusted Memory Disaggregation for Secure Far Memory
- Designed and implemented a secure disaggregated memory system that supports fine-grained memory allocation on a Xilinx FPGA board
- Implemented the full HW & SW stacks, which include the FPGA and Linux kernel driver
- Led the project as the first author

Adaptive Page Migration Policy with Huge Pages in Tiered Memory Systems
- Analyzed the memory access patterns of workloads and proposed an adaptive page migration policy using the accessed bits in page table entries
- Led the project as the first author
- Published in IEEE Transactions on Computers

Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores
- In a virtualized environment, virtual CPUs (vCPUs) suffer from synchronization problems when a vCPU holding a lock sleeps
- Solved the synchronization problem in virtualized systems by introducing a CPU pool with a shorter time slice
- Contributed as the third author to discuss the idea, implement the controller, and conduct motivational experiments
- Published in EuroSys 2018

Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations
- Proposed a mechanism to encode page contiguity in page table entries to increase the TLB coverage
- Allowed an OS to determine the number of contiguity-encoded page table entries to adapt to various contiguity preferences
- Contributed as the second author to discuss the idea, implement the simulator, and conduct motivational experiments
- Published in ISCA 2017

Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching
- Solved the TLB scaling problem by proposing a virtual caching architecture backed with delayed segment address translation
- Contributed as the second author to discuss the idea, implement the simulator, and conduct motivational experiments
- Published in ISCA 2016

Patents

"Apparatus and Method for Accelerating Critical Service in Virtualized System", KR 1021157380000, Granted: May 21st, 2020
"Method and System to Improve TLB Coverage by Using Chunks of Contiguous Memory", KR 1019426630000, Granted: Jan. 21st, 2019

Professional Services

Program Committee (Light PC, MICRO 2026)
Program Committee (Light PC, HPCA 2024)
Artifact Evaluation Committee (MLSys 2022)

Awards & Scholarships

Dr. Sudhakar Yalamanchili Award, ModSim, Aug. 2023
Stars of Tomorrow (Award of Excellence), Microsoft Research Asia, Aug. 2018
Excellent Teaching Assistant Award, KAIST, Mar. 2018
KFAS Scholarship, Korea Foundation for Advanced Studies, 2017-2019
National Scholarship, KAIST, 2014 - 2021
Dean's List, College of Information & Communication Engineering, Apr. 2012, Oct. 2012, Apr. 2013, Oct. 2013
National Scholarship for Science and Engineering, Korea Student Aid Foundation (KOSAF), 2010-2013

Skills

Programming Languages: C, C++, Python, CUDA
System Software: Linux Kernel Development, Linux System Administration
Architecture Simulators: Structural Simulation Toolkit (SST), Gem5, NVMain, Pin, MARSSx86
FPGA: Vivado, Vitis HLS, Verilog, Tcl
Machine Learning Frameworks: PyTorch, TensorFlow

Extracurricular Activities

Student Representative, Department of Computer Science, KAIST, Feb 2016 - Dec 2016
Student, Korea Information Technology Research Institute, Jul. 2013 - Feb. 2014
President, Computer Security Research Club, Sungkuynkwan University, Jul 2011 - Feb 2012
Vice President, Computer Security Research Club, Sungkyunkwan University, Mar 2011 - Jun 2011

Teaching Experiences

Teaching Assistant for Computer Organization, KAIST, Fall 2017
Teaching Assistant for System Programming, KAIST, Spring 2017
Teaching Assistant for Introduction to Computer Application, KAIST, Fall 2015
Teaching Assistant for Digital System and Lab, KAIST, Spring 2015
Teaching Assistant for System Programming, KAIST, Fall 2014
Teaching Assistant for Introduction to Programming (Python), KAIST, Spring 2014

CV updated on 2025-Nov-1st

Google Sites

Report abuse