Domain-Specific Accelerators

A Dataflow architecture design (AI Processor)

A scalable deep-learning accelerator supporting the training process is implemented for device personalization of deep convolutional neural networks (CNNs). It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. A disparate dataflow architecture is implemented for the weight gradient computation to enhance PE utilization while maximally reuse the input data.

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, and Lee-Sup Kim, "An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices," IEEE Journal of Solid-State Circuits, Oct. 2020.

Mixed-precision Processing units

From two perspectives: 1) separate precision decision for each input data, 2) maintenance of high performance on inference, we configure the data with low-bit fixed-point of activation/weight and floating-point based error gradient securing up to half-precision. A novel MAC architecture is designed to compute low- and high-precision modes for the different input combinations. By substituting a high-cost floating-point based addition to brick-level separate accumulations, we realize both area-efficient architecture and high throughput for low-precision computation.

Seungkyu Choi, Jaekang Shin, and Lee-Sup Kim, "A Deep Neural Network Training Architecture with Inference-aware Heterogeneous Data-type," IEEE Transactions on Computers, May 2022.

sparsity-aware PE architecture

- Multi-window dataflow architecture for selective weight update

- Redundancy Censoring Unit (RCU)-based PE architecture

- Jaekang Shin, Seungkyu Choi, Yeongjae Choi, and Lee-Sup Kim, "A Pragmatic Approach to On-device Incremental Learning System with Selective Weight Updates," ACM/IEEE Design Automation Conference (DAC), 2020.

- Kangkyu Park, Seungkyu Choi, Yeongjae Choi, and Lee-Sup Kim, "Rare Computing: Removing Redundant Multiplications from Sparse and Repetitive Data in Deep Neural Networks," IEEE Transactions on Computers, Apr. 2022.

High-performance DSP engine Implementation

- A configurable baseband processing architecture that efficiently handles parallel sample streams targeting ultra-wide bandwidth inter-satellite optical communications for the target data rates exceeding 100 Gbps

- A parallelogram-style systolic accelerator, specifically designed for parallel processing for correlation kernels

- Seungkyu Choi, Huanshihong Deng, Kuan-Yu Chen, Yufan Yue, David Blaauw, and Hun-Seok Kim, "ParaBase: A Configurable Parallel Baseband Processor for Ultra-High-Speed Inter-Satellite Optical Communications," ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), 2024.