Search this site
Embedded Files
Skip to main content
Skip to navigation
VLSI LAB, HKUST
HOME
PEOPLE
RESEARCH
PUBLICATIONS
MISCELLANEOUS
VLSI LAB, HKUST
HOME
PEOPLE
RESEARCH
PUBLICATIONS
MISCELLANEOUS
More
HOME
PEOPLE
RESEARCH
PUBLICATIONS
MISCELLANEOUS
VLSI Architectures for Deep Learning
Introduction
Deep learning achieves great success in various domains
Playing Go – Personal assistant – Search engine
Deep neural networks (DNNs) are the cores of deep learning
Repeated hierarchy with
tens to hundred
of layers
Modular hierarchy with CONV, POOL, FC
Large computational complexity and huge memory footprints
About
10G
Ops per frame
Over
50M
parameters to store
CMOS-based Deep Learning Accelerator
Network Compression
Blocked hash compression
Extra block constraint to preserve the spatial locality for hash compression
Compress network by
16x ~ 32x
Sparsity-based Hardware Accelerator
Dedicated sparsity predictor to bypass the unnecessary operations
Network-on-Chip (NoC) based hardware architecture
Throughput improvement:
10% ~ 70%
Power reduction:
50%
Memristor-based Deep Learning Accelerator
In-situ Analog Computation Based on RRAM
Matrix-vector multiplication done in-situ on resistive random access memory(RRAM) to address the memory wall issue
Energy and timing overhead mainly on the analog computing unit and analog-to-digital interface
High-throughput and Energy-efficient Accelerator Design
Dedicated encoding on synaptic weights and activations to improve energy efficiency of the analog computation
Distribution analysis on crossbar bitline outputs: reducing ADC bit-resolution
Dynamically quantize the weights and activations according to the significance of fine-grained partial products
Throughput, energy efficiency and area efficiency improvement:
2x~4x
Selected Publications about Deep Learning
Google Sites
Report abuse
Google Sites
Report abuse