VLSI LAB, HKUST

VLSI Architectures for Deep Learning

Introduction

Deep learning achieves great success in various domains
- Playing Go – Personal assistant – Search engine
Deep neural networks (DNNs) are the cores of deep learning
- Repeated hierarchy with tens to hundred of layers
- Modular hierarchy with CONV, POOL, FC
Large computational complexity and huge memory footprints
- About 10G Ops per frame
- Over 50M parameters to store

CMOS-based Deep Learning Accelerator

Network Compression

Blocked hash compression
Extra block constraint to preserve the spatial locality for hash compression
Compress network by 16x ~ 32x

Sparsity-based Hardware Accelerator

Dedicated sparsity predictor to bypass the unnecessary operations
Network-on-Chip (NoC) based hardware architecture
Throughput improvement: 10% ~ 70%
Power reduction: 50%

Memristor-based Deep Learning Accelerator

In-situ Analog Computation Based on RRAM

Matrix-vector multiplication done in-situ on resistive random access memory(RRAM) to address the memory wall issue
Energy and timing overhead mainly on the analog computing unit and analog-to-digital interface

High-throughput and Energy-efficient Accelerator Design

Dedicated encoding on synaptic weights and activations to improve energy efficiency of the analog computation
Distribution analysis on crossbar bitline outputs: reducing ADC bit-resolution
Dynamically quantize the weights and activations according to the significance of fine-grained partial products
Throughput, energy efficiency and area efficiency improvement: 2x~4x

Selected Publications about Deep Learning

Google Sites

Report abuse