Preprint
MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM [paper]
Woongkyu Lee, Junhee Cho, and Jungwook Choi
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System [paper]
Hyucksung Kwon, Kyungmo Koo, Janghyeon Kim, Woongkyu Lee, Minjae Lee, Hyungdeok Lee, Yousub Jung, Jaehan Park, Yosub Song, Byeongsu Yang, Haerang Choi, Guhyun Kim, Jongsoon Won, Woojae Shin, Changhyun Kim, Gyeongcheol Shin, Yongkee Kwon, Ilkon Kim, Euicheol Lim, John Kim, and Jungwook Choi
2025
SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning
Hans Kasan, Dennis Abts, Jungwook Choi, and John Kim
MICRO 2025
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi, and Simyung Chang
NeurIPS 2025 [paper]
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
Dowon Kim*, MinJae Lee*, Janghyeon Kim, HyuckSung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Jinin So, and Jungwook Choi
PACT 2025
Enhancing Generalization in Data-free Quantization via Mixup-class Prompting
Jiwoong Park*, Chaeun Lee*, Yongseok Choi, Sein Park, Deokki Hong, and Jungwook Choi
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, and Jungwook Choi
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee, Jiwoong Park, Jinseok Kim, Yongjik Kim, Jungju Oh, Jinwook Oh, and Jungwook Choi
ACL 2025 (Findings) [paper] [code]
RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy
Geonho Lee*, Janghwan Lee*, Sukjin Hong*, Minsoo Kim, Euijai Ahn, Du-Seong Chang, and Jungwook Choi
2024
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Minsoo Kim, Kyuhong Shim, Jungwook Choi, and Simyung Chang
EMNLP 2024 [paper]
BABOL: A Software-Programmable NAND Flash Controller
Kibin Park, Alberto Lerner, Sangjin Lee, Philippe Bonnet, Yong Ho Song, Philippe Cudré-Mauroux, and Jungwook Choi
MICRO 2024 [paper]
ISP2DLA: Automated Deep Learning Accelerator Design for On-Sensor Image Signal Processing
Dong-eon Won*, Yeeun Kim*, Janghwan Lee, Minjae Lee, Jonghyun Bae, Jongjoo Park, Jeongyong Song, and Jungwook Choi
ASAP 2024 (Poster) [paper]
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Janghwan Lee*, Seongmin Park*, Sukjin Hong, Minsoo Kim, Du-Seong Chang, and Jungwook Choi
ACL 2024 [paper]
RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models
Minsoo Kim, Sihwa Lee, Wonyong Sung, and Jungwook Choi
ACL 2024 (Findings) [paper]
Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection
Seongmin Park, Minjae Lee, Junwon Choi, and Jungwook Choi
CVPRW 2024 [paper]
Pruning with Scaled Policy Constraints for Light-weight Reinforcement Learning
Seongmin Park*, Hyungmin Kim*, Hyunhak Kim, and Jungwook Choi
IEEE Access [paper]
Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference
Jinwoo Jeong, Byungmin Ahn, Dongmin Shin, and Jungwook Choi
ICEIC 2024 (Best Paper) [paper]
Searching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference
Youngdeok Hwang*, Janghwan Lee*, Jiwoong Park, Jieun Lim, and Jungwook Choi
ICEIC 2024 [paper]
SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving
Minjae Lee, Hyungmin Kim, Seongmin Park, Minyong Yoon, Janghwan Lee, Junwon Choi, Mingu Kang, and Jungwook Choi
HPCA 2024 [paper]
2023
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Janghwan Lee*, Minsoo Kim*, Seungcheol Baek, Seokjoong Hwang, Wonyong Sung, and Jungwook Choi
EMNLP 2023 [paper]
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, and Jungwook Choi
SiT Dataset: Socially Interactive Pedestrian Trajectory Dataset for Social Navigation Robots
Jongwook Bae, Jungho Kim, Junyong Yun, Changwon Kang, Jeongseon Choi, Chanhyeok Kim, Junho Lee, Jungwook Choi, and Jun Won Choi
NeurIPS 2023 (Datasets and Benchmarks Track) [paper] [code]
Range-Invariant Approximation of Non-Linear Operations for Efficient BERT Fine-Tuning
Janghyeon Kim, Janghwan Lee, Jeong Ho Han, Sangheon Lee, and Jungwook Choi
DAC 2023 [paper]
Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference
Minyong Yoon, and Jungwook Choi
AICAS 2023 [paper]
Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers
Janghwan Lee, Youngdeok Hwang, and Jungwook Choi
ICASSP 2023 [paper]
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, and Jungwook Choi
Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Seongmin Park, Beomseok Kwon, Jieun Lim, Kyuyoung Sim, Tae-Ho Kim, and Jungwook Choi
TinyML 2023 [paper]
2022
Achieving low write latency through new stealth program operation supporting early write completion in NAND flash memory
Moonseok Jang, Kexin Wang, Sangjin Lee, Hyeonggyu Jeong, Inyeong Song, Yong Ho Song, and Jungwook Choi
Journal of Systems Architecture (Vol. 133) [paper]
Improving NVM Lifetime Using Task Stack Migration on Low-End MCU-Based Devices
Jeongmin Lee, Moonseok Jang, Kexin Wang, Inyeong Song, Hyeonggyu Jeong, Jinwoo Jeong, Yong Ho Song, and Jungwook Choi
IEEE Access [paper]
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Minsoo Kim, Sihwa Lee, Suk-Jin Hong, Du-Seong Chang, and Jungwook Choi
Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores
Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jieun Lim, and Jungwook Choi
SiPS 2022 [paper]
Regularizing Activation Distribution for Ultra Low-bit Quantization-Aware Training of MobileNets
Seongmin Park, Wonyong Sung, and Jungwook Choi
SiPS 2022 [paper]
Nn-lut: neural approximation of non-linear operations for efficient transformer inference
Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, and Dong Hyun Lee, and Jungwook Choi
DAC 2022 [paper]
Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers
Janghwan Lee, and Jungwook Choi
AICAS 2022 [paper]
Understanding the role of self attention for efficient speech recognition
Kyuhong Shim, Jungwook Choi, and Wonyong Sung
ICLR 2022 (Spotlight) [paper]
Minimizing Global Buffer Access in a Deep Learning Accelerator Using a Local Register File with a Rearranged Computational Sequence
Minjae Lee, Zhongfeng Zhang, Seungwon Choi, and Jungwook Choi
Sensors 2022 [paper]
2021
TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference
Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, and Byonghyo Shim
SiPS 2021 [paper]
Understanding and Reducing Weight-Load Overhead of Systolic Deep Learning Accelerators
Jinwon Joo, Minyong Yoon, Mingu Kang, JongGeon Lee, JinIn So, IlKwon Yun, Yongsuk Kwon, KyungSoo Kim, and Jungwook Choi
ISOCC 2021 [paper]
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
Kyuhong Shim, Iksoo Choi, Wonyong Sung, and Jungwook Choi
ISOCC 2021 (Best Paper) [paper]
RaPiD: AI accelerator for ultra-low precision training and inference
Venkataramani, Srinivasan, Wang, Sen, Zhang, Agrawal, Kar, Jain, Mannari, Tran, Li, Ogawa, Ishizaki, Inoue, Schaal, Serrano, Choi, Sun, Wang, Chen, Allain, Bonano, Cao, Casatuta, Cohen, Fleischer, Guillorn, Haynie, Jung, Kang, Kim, Koswatta, Lee, Lutz, Mueller, Oh, Ranjan, Ren, Rider, Schelm, Scheuermann, Silberman, Yang, Zalani, Zhang, Zhou, Ziegler, Shah, Ohara, Lu, Curran, Shukla, Chang, Gopalakrishnan
ISCA 2021 [paper]
A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling
Agrawal, Lee, Silberman, Ziegler, Kang, Venkataramani, Cao, Fleischer, Guillorn, Cohen, Mueller, Oh, Lutz, Jung, Koswatta, Zhou, Zalani, Bonanno, Casatuta, Chen, Choi, Haynie, Herbert, Jain, Kar, Kim, Li, Ren, Rider, Schaal, Schelm, Scheuermann, Sun, Tran, Wang, Wang, Zhang, Shah, Curran, Srinivasan, Lu, Shukla, Chang, Gopalakrishnan
ISSCC 2021 [paper]
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
Yoonho Boo, Sungho Shin, Jungwook Choi, and Wonyong Sung
AAAI 2021 [paper]