Robust Visual Tracking via Hierarchical Convolutional Features
Chao Ma Jia-Bin Huang Xiaokang Yang Ming-Hsuan Yang
Shanghai Jiao Tong University Virginia Tech UC Merced
1. Abstract
Visual tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we propose to exploit the rich hierarchical features of deep convolutional neural networks to improve the accuracy and robustness of visual tracking. Deep neural networks trained on object recognition datasets consist of multiple convolutional layers. These layers encode target appearance with different levels of abstraction. For example, the outputs of the last convolutional layers encode the semantic information of targets and such representations are invariant to significant appearance variations. However, their spatial resolutions are too coarse to precisely localize the target. In contrast, features from earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchical features of convolutional layers as a nonlinear counterpart of an image pyramid representation and explicitly exploit these multiple levels of abstraction to represent target objects. Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance. We infer the maximum response of each layer to locate targets in a coarse-to-fine manner. To further handle the issues with scale estimation and target re-detection from tracking failures caused by heavy occlusion or moving out of the view, we conservatively learn another correlation filter that maintains a long-term memory of target appearance as a discriminative classifier. We apply the classifier to two types of object proposals: (1) proposals with a small step size and tightly around the estimated location for scale estimation; and (2) proposals with large step size and across the whole image for target re-detection. Extensive experimental results on large-scale benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art tracking methods.
2. Downloads
Robust Visual Tracking via Hierarchical Convolutional Features
Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018
[Paper] [Code] [Video Results]
Hierarchical Convolutional Features for Visual Tracking
Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang
International Conference on Computer Vision (ICCV), 2015
[Paper] [Supplement] [Poster] [Slide] [Code]
3. Experiments
Overall Performance on the OTB-2013 dataset
Overall Performance on the OTB-2015 dataset
4. References
- Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3), 583–596 (2015)
- Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(7), 1409–1422 (2012)
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Advances in Neural Inf. Process. Systems, 2012
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of International Conference on Learning Representation, 2015
- Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2013)
- Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 7(9), 1834–1848 (2015)
- Zhang, J., Ma, S., Sclaroff, S.: MEEM: Robust tracking via multiple experts using entropy minimization. In: Proceedings of the European Conference on Computer Vision (2014)
- C. Ma, J. Huang, X. Yang, and M. Yang, “Hierarchical convolutional features for visual tracking,” in Proc. of IEEE Int. Conf. on Computer Vision, 2015