Ph.D. Study, UT Austin
Aug 2021 - Present
Mixture-of-Experts (MoE) Serving Systems (NeurIPS'24) Worked on designing an efficient serving system for Mixture-of-Experts LLM. Re-architected the current system-unfriendly layer-wise routers of MoE to be decoupled from backbone MoE to enable pre-computing and lookahead scheduling, enhancing expert-aware batching and caching.
Scalable Distributed Pre-training (ICML’23) Worked on designing a distributed data-efficient pre-training framework. For gradient-based subset selection algorithms, proposed framework reduces significant pre-training cost, provides stable gradients in the early stage of training, and improved robustness and final accuracy.
Pytorch Distributed Team, Meta
May 2023 - Aug 2023
Automated Pipeline Parallel Training Worked on improving pipeline parallel training library by hiding communications in the critical path. Designed an optimization strategy for automated parallelism.
Artificial Intelligence Research Lab, HP Labs
May 2022 - Apr 2023
Data-Efficient Training (CVPRW’23) Worked on dataset reduction and data-efficient training. Leveraged ensemble learning to reduce the training set size while minimizing the accuracy drop.
On-device Lab, Samsung Research, Samsung Electronics
Jun. 2018 - Sep. 2021
Model Compression (CVPR'22): Model compression includes low-rank approximation, quantization, and pruning. As a practical compression technique, our group focused on parameter quantization to reduce the model's memory footprint. Worked on post-training quantization for language models and vision models.
CNN Accelerator Design (2018. 06 ~ 2020. 06): Actively participated in the architecture exploration. Implemented an in-house performance modeling simulator in C++. Designed and implemented pointwise operations (e.g., activation functions, elementwise operations) processors in Verilog HDL. Will be deployed in Samsung Digital TV.
Computer System and Network Lab, School of Computing, KAIST
Sep. 2015 - May. 2018
Secure Routing (ISCA'21): While recent secure processors encrypt memory requests data to guarantee confidentiality, memory address (or traces) can leak important information. Worked on oblivious computation on a multi-node system to hide coarse-grain access patterns.
Multi-dimensional Parallel Training (MICRO'18): This work proposes accelerating deep learning training in a memory-centric system by applying Winograd transformation. Worked on the implementation of dynamic clustering topology in the cycle-accurate full-system simulator.
lowRISC, Google Summer of Code 2017
May. 2017 - Aug. 2017
Implemented ORAM interface for RISC-V systems both in software and hardware. Obtained hands-on experience in collaborating with open-source communities, multiple software simulators (including spike and DRAMSim2), and SystemVerilog.
Systems Software and Security Lab, Georgia Tech
Jan. 2017 - Mar. 2017
Explored RISC-V ISA and Rocket architecture for hardware security research. Worked on studying FPGA programming with Intel SoC board to accelerate system software functions.
Backend Software Engineer, Jobplanet, Braincommerce Inc
Dec. 2013 - Jun. 2015
Braincommerce is a startup company that runs Jobplanet, and I was a starting member of the company.
As a starting member, I and my friends were in charge of the design and implementation of the entire initial product server.
In particular, I worked on the design of the database, user log system, recommendation engine based on the knowledge graph.
For operation and management, I worked on the automated administration tools including the content search tool with combined filters and mass mailer.
[TA] KAIST CS101 Introduction to Programming
[TA] KAIST CS206 Data Structures
[TA] KAIST CS310 Computer Architecture (for undergraduate students)
[TA] KAIST CS510 Advanced Computer Architecture (for graduate students)
[TA] UT Austin CS360V Virtualization (for online master students)