AI Hardware and Algorithm Lab

Hanyang University, Seoul, Korea

Our Research Adventures

Welcome to AI Hardware and Algorithm Lab at Hanyang University. At our lab, we are at the forefront of innovating algorithms and hardware architectures for high-speed and energy-efficient artificial intelligence. Our lab specializes in the seamless integration of algorithm-hardware co-design and the optimization of deep learning software, setting new benchmarks in AI efficiency. At the intersection of algorithm brilliance and hardware robustness, we are redefining the boundaries of AI capabilities, committed to advancing the future of intelligent systems.

Latest News

[Nov 8, 2025] Our paper titled "PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System" has been accepted at 32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA 2026).

[Sep 19, 2025] Our paper titled "InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding" has been accepted at The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

[Aug 1, 2025] Our paper titled "Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits" has been accepted at The International Conference on Parallel Architectures and Compilation Techniques (PACT 2025).

[Jul 16, 2025] Our paper titled "Enhancing Generalization in Data-free Quantization via Mixup-class Prompting" has been accepted at ICCV Workshop 2025 (3rd Workshop on Binary and Extreme Quantization for Computer Vision).

[Jun 26, 2025] Our paper titled "Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control" has been accepted at International Conference on Computer Vision (ICCV 2025). You can check out our project here for more details.

[May 16, 2025] Our paper titled "AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference" has been accepted at Findings of the Association for Computational Linguistics (ACL 2025).

[Dec 10, 2024] Our paper titled "RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy" has been accepted at The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025).

[Sep 20, 2024] Our paper titled "InfiniPot: Infinite Context Processing on Memory-Constrained LLMs" has been accepted at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024).

[Jul 18, 2024] Our paper titled "BABOL: A Software-Programmable NAND Flash Controller" has been accepted at 57th IEEE/ACM International Symposium on Microarchitecture (MICRO 2024).

[May 16, 2024] Two of our papers have been accepted at the Annual Meeting of the Association for Computational Linguistics (ACL 2024).

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment (Main)
RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models (Findings)

[Apr 13, 2024] Our team has won 3rd place in the AICAS Grand Challenge 2024 for the competition "Software and Hardware Co-optimization for General Large Language Model Inference on CPU" (link).

[Apr 9, 2024] Our paper titled "Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection" has been accepted at CVPR Workshop 2024 (The 7th Workshop on Efficient Deep Learning for Computer Vision).

[Feb 7, 2024] Our paper titled "Pruning with Scaled Policy Constraints for Light-weight Reinforcement Learning" has been accepted at IEEE Access.

[Oct 27, 2023] Our paper titled "SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving" has been accepted at IEEE International Symposium on High-Performance Computer Architecture (HPCA 2024).

[Oct 8, 2023] Our paper titled "Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization" has been accepted at The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).

[Sep 22, 2023] Our paper titled "Token-Scaled Logit Distillation for Ternary Weight Generative Language Models" has been accepted at Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).

[Feb 24, 2023] Our paper titled "Range-Invariant Approximation of Non-Linear Operations for Efficient BERT Fine-Tuning" has been accepted at 2023 60th ACM/IEEE Design Automation Conference (DAC 2023).

[Feb 16, 2023] Our paper titled "Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers" has been accepted at 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023).

[Jan 22, 2023] Our paper titled "Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers" has been accepted at The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023).

[Oct 9, 2022] Our paper titled "Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders" has been accepted at The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).

[Feb 20, 2022] Our paper titled "Nn-lut: neural approximation of non-linear operations for efficient transformer inference" has been accepted at 2022 59th ACM/IEEE Design Automation Conference (DAC 2022).

[Dec 18, 2020] Our team has won 1st place in the Model Compression Track at the AI Grand Challenge 2020 (MSIT South Korea) (News).

Page updated

Google Sites

Report abuse