beamtracking

Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning based Beam Search

Xiao Wang#1,2, Zhe Chen#3, Bo Jiang#1, Jin Tang#1, Bin Luo#1, Dacheng Tao#3

1. Anhui University, Hefei, China 2. Pengcheng Laboratory, Shenzhen, China 3. The University of Sydney, Australia

[Paper] [Code] [Demo]

Background and Motivation

Although great progress in visual tracking has been witnessed, recent visual trackers still adopt a relatively primitive and unitary inference strategy that only selects a location with the maximum response score as the final tracking result in each frame. However, locations with maximum scores may not be the optimal choice, especially when encountering challenging tracking scenarios like heavy occlusions and fast motion, in which errors will be accumulated continuously and thus scores are not reliable for tracking. In this paper, we instead propose a novel beam search-based multi-agent reinforcement learning strategy (named as BeamTracking) to improve visual tracking by maintaining and analyzing multiple tracking results simultaneously. In our method, we consider different tracking results as different tracking agents and formulate the tracking task as a sample selection problem based on multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Then, we introduce a multi-agent reinforcement learning (MARL) framework which mainly consists of a novel GRU-based context-aware observation network and multiple policy networks, to fulfill the tracking task. By applying our tracking strategy to the whole video sequence and maintaining multiple estimated trajectories for the target, we can achieve more robust tracking by selecting the trajectory with the maximum score. Extensive experiments on eight popular tracking benchmarks validated the effectiveness of the proposed algorithm.

Fig.1. Comparison between Greedy Search and MARL based Beam Search for visual tracking

Demo Video

The Proposed Approach

Experiments

We evaluate our tracker on multiple tracking benchmark datasets. We also analyze the tracking results under each attributed video sequences. More details can be found in our papers.

Visualization

Citation