Deform-SOT Benchmark

We collect 50 challenging video sequences with full manual annotations to evaluate online deformable object tracking methods, termed as the Deform-SOT dataset. Out of the 50 videos, 20 sequences have been used in previous works, e.g., avatar, football, lemming, and yunakim, etc., while the remaining 30 videos are collected by us from the Internet, such as airbattle, trampoline and uneven-bars, etc. Here we focus on tracking totally deformable target in unconstrained environment. The range of video frames is about [100, 1300]. We give the collected videos with fully annotation. In the following figure, we show examples of annotations of the first frame the 50 videos.

The collected sequences are diverse with respect to object categories and classes, camera viewpoints, sequence lengths and challenging levels. Different from 11 attributes for general object tracking in the OTB, our deformable dataset divides

the challenging levels into 6 class, as described below:

• large deformation. The non-rigid target occurs with local structural or significant deformation in shape.

• severe occlusion. The target is partially or fully occluded by other objects or background.

• abnormal movement. The target moves abnormally, including fast motion, in-plane and out-of-plane rotation

and other complex motions, etc.

• illumination variation. The illumination in the target region is moderately to significantly changed.

• scale change. The scale of the target changes drastically.

• background clutter. The background near the target has the similar appearance as the target.

We evaluate the proposed algorithm against the state-ofthe-art methods on the our dataset. Specifically, we include the following state-of-the-art methods: incremental visual tracker (IVT), l1 tracker (L1T), tracking-learning-detection method (TLD), multiple instance learning tracker (MIL), structured output tracker (Struck), multi-task sparse learning based tracker (MTT), compressive tracker (CT), color name based tracker (CN), spatio-temporal structural context based Tracker (STT) and spatio-temporal context tracker (STC), fragment tracker (Frag), super-pixel tracker (SPT), sparsity-based collaborative model based tracker (SCM), locally orderless tracker (LOT), adaptive structural local sparse appearance model based tracker (ASLA), latent structural learning based tracker (LSL), local and global visual information based tracker (LGT), dynamic graph based tracker (DGT), and structure-aware hyper-graph based tracker (SAT). For fair comparison, we use the same initial bounding box of the first frame of each video for all trackers. The experimental results of the trackers are reproduced from the available source codes with recommended parameters.

The success and precision plots of overall performance comparison results on the collected 50 tracking sequences are shown in the following figures.

Downloads

• Deform-SOT Benchmark [Dataset] [Codes]

• Source codes of the trackers [Codes].

Citations

If you use the dataset, our tracking results or the source codes, please cite our paper:

• Dawei Du, Honggang Qi, Wenbo Li, Longyin Wen, Qingming Huang, Siwei Lyu, " Online Deformable Object Tracking Based on Structure-Aware Hyper-graph", IEEE Transaction on Image Processing (TIP), 2016. [PDF]