RGB-T Object Tracking: Benchmark and Baseline


RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of the thermal information to the visible data. However, the related research is limited by a comprehensive evaluation platform. In this paper, we contribute a video benchmark dataset for the RGB-T tracking purpose. It has three major advantages over existing ones: 1) Its size is sufficiently large for large-scale performance evaluation (total frame number: 233.8K, maximum frame per sequence: 8K). 2) The alignment between RGB-T sequence pairs is highly accurate, which does not need pre- and post-processing. 3) The occlusion levels are annotated for analyzing the occlusion-sensitive performance of different tracking algorithms. Moreover, we propose a novel graph-based approach to learn a robust object representation for RGB-T tracking. In particular, the tracked object is represented with a graph with image patches as nodes. This graph is dynamically learned in a single unified optimization framework from two aspects. First, the graph affinity is optimized based on the weighted sparse representation, in which the modality weight is introduced to leverage RGB and thermal information adaptively. Second, each graph node (i.e., image patch) weight is propagated from the initial ones along with graph affinity. The optimized patch weights are then imposed on the extracted RGB and thermal features,and the target object is finally located by adopting the structured SVM algorithm. Extensive experiments on both public and newly created datasets demonstrate the effectiveness of the proposed tracker against several state-of-the-art tracking methods.


The full benchmark contains 234 RGB-T video sequence paris.

  • We have annotate the sequences with 12 attributes, which represents the challenging aspects in visual tracking.
  • RGB and thermal source data all has annotated the corresponding grouth-truth.
  • Each row in the grouth-truth files represents the bounding box of the target in that frame,(x,y,box-width,box-height).
Attr     Description

NO No Occlusion - the target is not occluded.

PO Partial Occlusion - the target object is partially occluded.

HO Heavy Occlusion - the target object is heavy occluded (over 80% percentage).

LI Low Illumination - the illumination in the target region is low.

LR Low Resolution - the resolution in the target region is low.

TC Thermal Crossover - the target has similar temperature with other objects or background surroundings.

DEF Deformation - non-rigid object deformation.

FM Fast Motion - the motion of the ground truth between two adjacent frames is larger than 20 pixels.

SV Scale Variation - the ratio of the first bounding box and the current bounding box is out of the range [0.5,1].

MB Motion Blur - the target object motion results in the blur image information.

CM Camera Moving - the target object is captured by moving camera.

BC Background Clutter - the background information which includes the target object is messy.

Several sample video pairs

The RGBT234 dataset can be downloaded through the link: Download Dataset. In order to people that does not access Google cloud disk can load the dataset, we also share the dataset in Baidu cloud disk. The download link is https://pan.baidu.com/s/1naq87OmHz2c_GrtOdFCpgQ.

Experimental Results:

The baseline results in RGBT234 dataset can be downloaded at the following links.

  • RGBT234-results-VOT: Results of the baseline trackers for VOT evaluation.
  • The corresponding link in Baidu Netdisk is https://pan.baidu.com/s/1p2QLcqYe1oCbphg7mm83LQ.
  • RGBT234-results-MPRMSR: Results of the baseline trackers for MPR and MSR evaluation.
  • The corresponding link in Baidu Netdisk is https://pan.baidu.com/s/1cb5eLuJ6QIDt__M7LZ2-_g.

The evaluation code can be downloaded at the following links.

  • VOT-Evaluation: The evaluation code for three metric from VOT. First, according to the tutorial on official website configures the VOT. Then, the results for VOT evaluation is placed in folder vot-workspace/results. Besides, the tracker.txt need to be constructed to store the identifier of all trackers. Final, the script run_analysis.m is executed to obtain the evaluation results about VOT. Because we don't annotate the challenging factors in every frame like VOT. So, we don't take the evaluation of challenging factors into account.
  • The corresponding link in Baidu Netdisk is https://pan.baidu.com/s/1hUMl_ebrfDiSD5118Y9W6Q.
  • MPR_MSR_Evaluation:The evaluation code for MPR and MSR metrics. First, the results of all baseline trackers need to copy into the folder BBresults. Then, the script main_GenerateMat_TPR.m is executed to obtain the distance and in folder ERRresults. Finally, the script main_attrDrawCurve_RGBT234.m load the result of the previous step in folder ERRresults. And the script is executed to obtain the precision plot and success plot that are stored in folder figsResults.
  • The corresponding link in Baidu Netdisk is https://pan.baidu.com/s/1N1rAAX3EjOhHFBz_585B9Q.

The Accuracy (A), Robustness(R),and Expected Average Overlap (EAO) of these evaluated trackers.