Hanqiu Chen's Webpage - Sharc Lab (undergraduate)

Low latency and energy-efficient object detection and tracking on FPGA

Advised by Prof. Callie Hao at Georgia Tech

Motivation

How to make object detectors and trackers focus on important regions in an image to avoid redundant computations?

Mask-Net Architecture

Our main idea is to add a light-weight branch to generate a rectangular mask, which can help to identify the object and background, such that the accelerator can skip the computation for the background. As a case study, we use SkyNet as the backbone and add a new branch after the third bundle in SkyNet. By sharing convolution layers with the SkyNet backbone network, the new branch can extract preliminary features of the image. In this way, the confidence mask can be obtained through the two-layer fully convolutional network, which greatly reduces the number of network parameters.

SkyNet backbone + new branch

Algorithm Innovations

Binarization of confidence mask: After obtaining the confidence mask, we set a threshold for each patch’s score. If the score is above the threshold, we keep this patch and reset its score to 1. Otherwise, we assume this patch to be background and reset its score to 0.
All pass mechanism: To improve the robustness of mask generation and to avoid mis-prediction for hard-to-detect cases, if the score of all patches of a confidence mask does not exceed the threshold, we assume that the new branch fails to identify the location where the object may exist. In this case, the score of all patches in the mask is reset to 1. We believe that this approach is conducive to improving IOU.

Hardware Innovations

Mask shape regularization: We propose the technique of shape regularization to change irregular shapes into regular ones. In this way, we can get a rectangular mask with four values: hmin, hmax, wmin, wmax. The four boundary values can be directly mapped to the corresponding feature map regions by scaling, which enables easy control on FPGA to calculate only part of the image where the object may exist.
Channel shuffle: To reduce the computation overhead and the number of parameters of the new branch, we propose to use group convolution. In order to reduce data movement between DDR and on-chip memory, we use the channel shuffle method. Thus we can load the data we find into on-chip memory in a specific order.

Key Results

ZCU106 FPGA

The score of masks with different sizes and thresholds

Mask quality analysis

Some detailed results reports

For more information about this work, please refer to my abstract [pdf] [slides]

Page updated

Google Sites

Report abuse