On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention for Long-Context LLM Serving
Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, Aditya Akella
The Forty-Second International Conference on Machine Learning (ICML'25)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai*, Yeonju Ro*, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang (*equal contribution, alphabetical order)
The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS'24)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP'24)
Lowering the Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit
Yeonju Ro, Zhangyang Wang, Vijay Chidambaram, Aditya Akella
The Fortieth International Conference on Machine Learning (ICML'23)
Ringleader: Efficiently Offloading Intra-Server Orchestration to NICs
Jiaxin Lin, Adney Cardoza, Tarannum Khan, Yeonju Ro, Brent Stephens, Hassan Wassel, Aditya Akella
The 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI'23)
Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error
Yongkweon Jeon*, Chungman Lee*, Eurlang Cho*, Yeonju Ro* (*equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22)
Ghost Routing to Enable Oblivious Computation on Memory-centric Networks [paper]
Yeonju Ro, Seongwook Jin, Jaehyuk Huh, John Kim
The 48th Annual ACM/IEEE International Symposium on Computer Architecture (ISCA'21)
Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture [paper]
Byungchul Hong, Yeonju Ro, John Kim
The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'18)
How I learned to stop worrying and love learned OS policies
Divyanshu Saxena, Jiayi Chen, Sujay Yadalam, Yeonju Ro, Rohit Dwivedula, Eric H. Campbell, Aditya Akella, Christopher J. Rossbach, Michael Swift
The 20th Workshop on Hot Topics in Operating Systems (HotOS XX)
Optimizing Transformer Inference with Selective Distillation: Layerwise Conversion to Linear Attention
Yeonju Ro, Zhenyu Zhang, Vijay Chidambaram, Aditya Akella
the 2nd Workshop on Hot Topics in System Infrastructure (HotInfra'24)
Dataset Efficient Training with Model Ensembling
Yeonju Ro, Cong Xu, Agnieszka Ciborowska, Suparna Bhattacharya, Frankie Li, Martin Foltin
The 6th Efficient Deep Learning for Computer Vision, a CVPR Workshop (CVPRW'23)
Q-Rater: Non-convex optimization for post-training uniform quantization
Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh
Post-training weighted quantization of neural networks for language models
Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro