Contexual Tracker with Structure Encoding

(CTSE)

Abstract

Motivated by the problem of object tracking in video sequences, this paper presents a new Contextual Object Tracker with Structural Encoding (CTSE). The novelty in our tracking approach lies in the application of contextual and structural information (that is specific to a target object) into a model-free tracker. This is first achieved by including features from a complementary region having correlated motion with the target object. Second, a local structure that represents a spatial constraint between features within the target object are included. SIFT keypoints are used as features to encode both these information. The tracking is done in three steps. Firstly, keypoints are detected and described to encode object structure. Secondly, they are matched in every frame. Finally, each matched keypoint votes for the target object location locally in a voting matrix by using the encoded object structure. The voting method gives more priority to the keypoints that have been matched more often and are closest to the target's center than the rest. The proposed tracker is competitive with state-of-the art trackers while being significantly faster. It ranks as first or second most accurate tracker in experiments with standard datasets.

Related Publication:

Contextual object tracker with structure encoding, IEEE, ICIP 2015 [PDF][Code][Poster]

Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger