FPGA Neural Network Accelerator

Convolutional neural networks (CNNs), have achieved great success in computer vision and machine intelligence tasks, but are also known to be both resource- and energy-demanding. Hardware-efficient CNN models, such as point/depth-wise convolution, were introduced to amortize the computation. Besides, model quantization has been explored to reduce the arithmetic bitwidth down to INT8 and INT4. Multiplier-less models have been investigated, such as binarized neural networks, Deepshift and ShiftAddNet [8], exploiting lightweight Boolean operations, bitshift and accumulation, sign flipping, at the cost of accuracy degradation or excessive hardware overheads.

Alternative MLDL models have been proposed recently. One such example is AdderNet, which achieves comparable model accuracy compared with the CNN counterpart. AdderNet employs the ℓ1-norm based similarity measure for feature extraction, which suggests MAC operations can be substituted with efficient sum-of-absolute-difference (SAD) operations. Nevertheless, model quantization for AdderNet remains an outstanding challenge, while existing AdderNet-based accelerator designs often leads to higher arithmetic bitwidth, low hardware efficiency and deteriorated throughput.

In this work, we will present our recent research on AdderNet-based deep learning (DL) accelerator designs. We will introduce WSQ-AdderNet [11] and AdderNet 2.0 accelerator designs, in which we will discuss novel approaches for model quantization, and then detail implementation strategies for optimal resource utilization and energy efficiency.

Page updated

Report abuse