Publications

Preprint

MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM [paper]

Woongkyu Lee, Junhee Cho, and Jungwook Choi

2026

PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System

Hyucksung Kwon*, Kyungmo Koo*, Janghyeon Kim, Woongkyu Lee, Minjae Lee, Gyeonggeun Jung, Hyungdeok Lee, Yousub Jung, Jaehan Park, Yosub Song, Byeongsu Yang, Haerang Choi, Guhyun Kim, Jongsoon Won, Woojae Shin, Changhyun Kim, Gyeongcheol Shin, Yongkee Kwon, Ilkon Kim, Euicheol Lim, John Kim, and Jungwook Choi

HPCA 2026

2025

SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning

Hans Kasan, Dennis Abts, Jungwook Choi, and John Kim

MICRO 2025

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Minsoo Kim, Kyuhong Shim, Jungwook Choi, and Simyung Chang

NeurIPS 2025 [paper]

Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits

Dowon Kim*, MinJae Lee*, Janghyeon Kim, HyuckSung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Jinin So, and Jungwook Choi

PACT 2025

Enhancing Generalization in Data-free Quantization via Mixup-class Prompting

Jiwoong Park*, Chaeun Lee*, Yongseok Choi, Sein Park, Deokki Hong, and Jungwook Choi

ICCVW 2025 [paper] [code]

Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control

Seongmin Park, Hyungmin Kim, Sangwoo Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, and Jungwook Choi

ICCV 2025 [paper] [project]

AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference

Janghwan Lee, Jiwoong Park, Jinseok Kim, Yongjik Kim, Jungju Oh, Jinwook Oh, and Jungwook Choi

ACL 2025 (Findings) [paper] [code]

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Geonho Lee*, Janghwan Lee*, Sukjin Hong*, Minsoo Kim, Euijai Ahn, Du-Seong Chang, and Jungwook Choi

AAAI 2025 [paper] [code]

2024

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Minsoo Kim, Kyuhong Shim, Jungwook Choi, and Simyung Chang

EMNLP 2024 [paper]

BABOL: A Software-Programmable NAND Flash Controller

Kibin Park, Alberto Lerner, Sangjin Lee, Philippe Bonnet, Yong Ho Song, Philippe Cudré-Mauroux, and Jungwook Choi

MICRO 2024 [paper]

ISP2DLA: Automated Deep Learning Accelerator Design for On-Sensor Image Signal Processing

Dong-eon Won*, Yeeun Kim*, Janghwan Lee, Minjae Lee, Jonghyun Bae, Jongjoo Park, Jeongyong Song, and Jungwook Choi

ASAP 2024 (Poster) [paper]

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Janghwan Lee*, Seongmin Park*, Sukjin Hong, Minsoo Kim, Du-Seong Chang, and Jungwook Choi

ACL 2024 [paper]

RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models

Minsoo Kim, Sihwa Lee, Wonyong Sung, and Jungwook Choi

ACL 2024 (Findings) [paper]

Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection

Seongmin Park, Minjae Lee, Junwon Choi, and Jungwook Choi

CVPRW 2024 [paper]

Pruning with Scaled Policy Constraints for Light-weight Reinforcement Learning

Seongmin Park*, Hyungmin Kim*, Hyunhak Kim, and Jungwook Choi

IEEE Access [paper]

Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference

Jinwoo Jeong, Byungmin Ahn, Dongmin Shin, and Jungwook Choi

ICEIC 2024 (Best Paper) [paper]

Searching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference

Youngdeok Hwang*, Janghwan Lee*, Jiwoong Park, Jieun Lim, and Jungwook Choi

ICEIC 2024 [paper]

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

Minjae Lee, Hyungmin Kim, Seongmin Park, Minyong Yoon, Janghwan Lee, Junwon Choi, Mingu Kang, and Jungwook Choi

HPCA 2024 [paper]

2023

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

Janghwan Lee*, Minsoo Kim*, Seungcheol Baek, Seokjoong Hwang, Wonyong Sung, and Jungwook Choi

EMNLP 2023 [paper]

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, and Jungwook Choi

NeurIPS 2023 [paper] [code]

SiT Dataset: Socially Interactive Pedestrian Trajectory Dataset for Social Navigation Robots

Jongwook Bae, Jungho Kim, Junyong Yun, Changwon Kang, Jeongseon Choi, Chanhyeok Kim, Junho Lee, Jungwook Choi, and Jun Won Choi

NeurIPS 2023 (Datasets and Benchmarks Track) [paper] [code]

Range-Invariant Approximation of Non-Linear Operations for Efficient BERT Fine-Tuning

Janghyeon Kim, Janghwan Lee, Jeong Ho Han, Sangheon Lee, and Jungwook Choi

DAC 2023 [paper]

Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference

Minyong Yoon, and Jungwook Choi

AICAS 2023 [paper]

Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers

Janghwan Lee, Youngdeok Hwang, and Jungwook Choi

ICASSP 2023 [paper]

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, and Jungwook Choi

EACL 2023 [paper] [code]

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

Seongmin Park, Beomseok Kwon, Jieun Lim, Kyuyoung Sim, Tae-Ho Kim, and Jungwook Choi

TinyML 2023 [paper]

2022

Achieving low write latency through new stealth program operation supporting early write completion in NAND flash memory

Moonseok Jang, Kexin Wang, Sangjin Lee, Hyeonggyu Jeong, Inyeong Song, Yong Ho Song, and Jungwook Choi

Journal of Systems Architecture (Vol. 133) [paper]

Improving NVM Lifetime Using Task Stack Migration on Low-End MCU-Based Devices

Jeongmin Lee, Moonseok Jang, Kexin Wang, Inyeong Song, Hyeonggyu Jeong, Jinwoo Jeong, Yong Ho Song, and Jungwook Choi

IEEE Access [paper]

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Minsoo Kim, Sihwa Lee, Suk-Jin Hong, Du-Seong Chang, and Jungwook Choi

EMNLP 2022 [paper] [code]

Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores

Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jieun Lim, and Jungwook Choi

SiPS 2022 [paper]

Regularizing Activation Distribution for Ultra Low-bit Quantization-Aware Training of MobileNets

Seongmin Park, Wonyong Sung, and Jungwook Choi

SiPS 2022 [paper]

Nn-lut: neural approximation of non-linear operations for efficient transformer inference

Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, and Dong Hyun Lee, and Jungwook Choi

DAC 2022 [paper]

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

Janghwan Lee, and Jungwook Choi

AICAS 2022 [paper]

Understanding the role of self attention for efficient speech recognition

Kyuhong Shim, Jungwook Choi, and Wonyong Sung

ICLR 2022 (Spotlight) [paper]

Minimizing Global Buffer Access in a Deep Learning Accelerator Using a Local Register File with a Rearranged Computational Sequence

Minjae Lee, Zhongfeng Zhang, Seungwon Choi, and Jungwook Choi

Sensors 2022 [paper]

2021

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, and Byonghyo Shim

SiPS 2021 [paper]

Understanding and Reducing Weight-Load Overhead of Systolic Deep Learning Accelerators

Jinwon Joo, Minyong Yoon, Mingu Kang, JongGeon Lee, JinIn So, IlKwon Yun, Yongsuk Kwon, KyungSoo Kim, and Jungwook Choi

ISOCC 2021 [paper]

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Kyuhong Shim, Iksoo Choi, Wonyong Sung, and Jungwook Choi

ISOCC 2021 (Best Paper) [paper]

RaPiD: AI accelerator for ultra-low precision training and inference

Venkataramani, Srinivasan, Wang, Sen, Zhang, Agrawal, Kar, Jain, Mannari, Tran, Li, Ogawa, Ishizaki, Inoue, Schaal, Serrano, Choi, Sun, Wang, Chen, Allain, Bonano, Cao, Casatuta, Cohen, Fleischer, Guillorn, Haynie, Jung, Kang, Kim, Koswatta, Lee, Lutz, Mueller, Oh, Ranjan, Ren, Rider, Schelm, Scheuermann, Silberman, Yang, Zalani, Zhang, Zhou, Ziegler, Shah, Ohara, Lu, Curran, Shukla, Chang, Gopalakrishnan

ISCA 2021 [paper]

A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling

Agrawal, Lee, Silberman, Ziegler, Kang, Venkataramani, Cao, Fleischer, Guillorn, Cohen, Mueller, Oh, Lutz, Jung, Koswatta, Zhou, Zalani, Bonanno, Casatuta, Chen, Choi, Haynie, Herbert, Jain, Kar, Kim, Li, Ren, Rider, Schaal, Schelm, Scheuermann, Sun, Tran, Wang, Wang, Zhang, Shah, Curran, Srinivasan, Lu, Shukla, Chang, Gopalakrishnan

ISSCC 2021 [paper]

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, and Wonyong Sung

AAAI 2021 [paper]

Page updated

Google Sites

Report abuse