Yeonju Ro

Research Experience

Ph.D. Study, UT Austin
- Aug 2021 - Present
- Adaptive Inference with Dual-state Linear Attention (ICML’25) Worked on the adaptive LLM serving through on-the-fly conversion of transformer architecture to linear architecture based on the load of the serving system. Improved a linear language model architecture to better suit for the distillation.
- Mixture-of-Experts (MoE) Serving Systems (NeurIPS'24) Worked on designing an efficient serving system for Mixture-of-Experts LLM. Re-architected the current system-unfriendly layer-wise routers of MoE to be decoupled from backbone MoE to enable pre-computing and lookahead scheduling, enhancing expert-aware batching and caching.
- Scalable Distributed Pre-training (ICML’23) Worked on designing a distributed data-efficient pre-training framework. For gradient-based subset selection algorithms, proposed framework reduces significant pre-training cost, provides stable gradients in the early stage of training, and improved robustness and final accuracy.
Azure Research, Microsoft
- May 2025 - Aug 2025
- Building Efficient and Reliable Agent Systems Worked on building an agent framework that automatically adds the most cost-optimal verifiers to vulnerable nodes. The proposed framework reduces end-to-end latency by up to 48.7% and achieves Pareto-optimal accuracy under the same budget.
Pytorch Distributed Team, Meta
- May 2023 - Aug 2023
- Automated Pipeline Parallel Training Worked on improving pipeline parallel training library by hiding communications in the critical path. Designed an optimization strategy for automated parallelism.
Artificial Intelligence Research Lab, HP Labs
- May 2022 - Apr 2023
- Data-Efficient Training (CVPRW’23) Worked on dataset reduction and data-efficient training. Leveraged ensemble learning to reduce the training set size while minimizing the accuracy drop.
On-device Lab, Samsung Research, Samsung Electronics
- Jun. 2018 - Sep. 2021
- Model Compression (CVPR'22): Model compression includes low-rank approximation, quantization, and pruning. As a practical compression technique, our group focused on parameter quantization to reduce the model's memory footprint. Worked on post-training quantization for language models and vision models.
- CNN Accelerator Design (2018. 06 ~ 2020. 06): Actively participated in the architecture exploration. Implemented an in-house performance modeling simulator in C++. Designed and implemented pointwise operations (e.g., activation functions, elementwise operations) processors in Verilog HDL. Will be deployed in Samsung Digital TV.
Computer System and Network Lab, School of Computing, KAIST
- Sep. 2015 - May. 2018
- Secure Routing (ISCA'21): While recent secure processors encrypt memory requests data to guarantee confidentiality, memory address (or traces) can leak important information. Worked on oblivious computation on a multi-node system to hide coarse-grain access patterns.
- Multi-dimensional Parallel Training (MICRO'18): This work proposes accelerating deep learning training in a memory-centric system by applying Winograd transformation. Worked on the implementation of dynamic clustering topology in the cycle-accurate full-system simulator.
lowRISC, Google Summer of Code 2017
- May. 2017 - Aug. 2017
- Implemented ORAM interface for RISC-V systems both in software and hardware. Obtained hands-on experience in collaborating with open-source communities, multiple software simulators (including spike and DRAMSim2), and SystemVerilog.
Systems Software and Security Lab, Georgia Tech
- Jan. 2017 - Mar. 2017
- Explored RISC-V ISA and Rocket architecture for hardware security research. Worked on studying FPGA programming with Intel SoC board to accelerate system software functions.

Work Experience

Backend Software Engineer, Jobplanet, Braincommerce Inc
- Dec. 2013 - Jun. 2015
- Braincommerce is a startup company that runs Jobplanet, and I was a starting member of the company.
- As a starting member, I and my friends were in charge of the design and implementation of the entire initial product server.
- In particular, I worked on the design of the database, user log system, recommendation engine based on the knowledge graph.
- For operation and management, I worked on the automated administration tools including the content search tool with combined filters and mass mailer.

Teaching Experience

[TA] KAIST CS101 Introduction to Programming
[TA] KAIST CS206 Data Structures
[TA] KAIST CS310 Computer Architecture (for undergraduate students)
[TA] KAIST CS510 Advanced Computer Architecture (for graduate students)
[TA] UT Austin CS360V Virtualization (for online master students)

Page updated

Google Sites

Report abuse