Hedged Deep Tracking / Hedging Deep Features for Visual Tracking

Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, Ming-Hsuan Yang.

Abstract

In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantages of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-the-art trackers.

Motivation

Features from different convolutional layers are effective in different scenarios.

Features from different convolutional layers are effective in face of different scenarios. The above figure shows tracking results of using CNN features from different convolutional layers on a representative frame of four sequences with diverse challenges. The best results are obtained using layers 12, 16, 10, and 10 on these four sequences, respectively.

Methods

Motivated by the above observations, we propose an adaptive Hedge algorithm to combine tracking results obtained by several weak experts/trackers, where each expert is based on correlation filters using CNN features extracted from only one convolutional layer. The following figure presents an overview of our method.

The proposed algorithm, HDT, consists of three components: (1) extracting CNN features from different convolutional layers using the pre-trained VGG-Net; (2) constructing weak trackers using correlation filters where each one is trained with CNN features from one layer; (3) hedging weak trackers into a stronger one using an improved Hedged algorithm.

The preliminary method is published on CVPR 2016 (the tracker named HDT), and an improved version is published on TPAMI 2018 (the tracker named HDT*). The main extensions of the improved version include:

We design a loss function using both spatial distance and appearance similarity. The latter is measured by a Siamese network.
We propose a cumulative regret model, which adaptively determines the proportion between the historical and instantaneous regrets by considering both the strength and trend of performance change over time.
We prove that the regret of the improved hedge has an upper bound.
We add a scale search step to handle size variations.

Downloads

[HDT code] here

[results on OTB50] here (including results of OPE, SRE, and TRE)

[results on OTB100] here (including results of OPE, SRE, and TRE)

[Supplement] here