Publications‎ > ‎

Hierarchical Convolutional Features for Visual Tracking



Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on ach convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.


Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang, "Hierarchical Convolutional Features for Visual Tracking," International Conference on Computer Vision (ICCV), 2015.


@inproceedings{Ma-ICCV-2015, author = {Ma, Chao and Huang, Jia-Bin and Yang, Xiaokang and Yang, Ming-Hsuan}, title = {Hierarchical Convolutional Features for Visual Tracking}, booktitle = {Proceedings of the IEEE International Conference on Computer Vision)}, year = {2015}, volume = {}, number = {}, pages = {} }
- High-res[PDF] (5.6 MB)
- Low-res [PDF] (680 KB)
Supplementary Material

Full results [PDF]
- (10.3 MB)

Visualization [ZIP
- (573.1 MB)

Precomputed tracks 
- Ours (1.9 MB) 
- All trackers (733.8 MB) 

PDF [Link] (2.3 MB)

Reference code
- [GitHub page]

Download all the visualization videos here

Qualitative Comparisons

One-pass evaluation (OPE), Spatial robustness evaluation (SRE), and Temporal robustness evaluation (TRE) on [Wu et al. CVPR 2013]

One-pass evaluation (OPE), Spatial robustness evaluation (SRE), and Temporal robustness evaluation (TRE) on [Wu et al. PAMI 2015]


Subpages (1): Result Visualization