RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking


Fangwei Zhong*,  Xiao Bi*,  Yudi Zhang,  Wei Zhang, Yizhou Wang

AAAI 2023 (Oral)

*  indicates equal contributation

Introduction

Motivation: Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. It is widely used in various applications such as mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across various scenarios remains a challenge, particularly in unstructured environments with cluttered obstacles and diverse layouts. To realize this, we argue that the key is to construct a state representation that can model the geometry structure of the surroundings and the dynamics of the target.

Method: To this end, we propose a framework called RSPT to form a structure-aware motion representation by Reconstructing Surroundings and Predicting the target Trajectory.  Moreover, we further enhance the generalization of the policy network by training in the asymmetric dueling mechanism.  

Experiments: Empirical results in virtual environments show that the tracker with RSPT significantly outperforms the existing methods among unseen environments, especially in environments with cluttered obstacles and diverse layouts. We further deploy the RSPT in a real-world scenario, showing good generalization in sim-to-real transfer.

RSPT Framework

An overview of the RSPT framework for active object tracking. It forms a structure-aware motion representation by Reconstructing the Surroundings of the tracker and Predicting the Trajectory of the target. The tracker first localizes the target by a video tracker, and simultaneously constructs a local grid map with the depth image and camera pose, then predicts the future trajectory of the target in the map. 

Exemplar sequences

In the reconstructed map, the red dot is the target,  the blue dot is the tracker, the white is obstacles, the black is the free space, and the gray is the unexplored area. 

In the predicted trajectory, the black line represents the historical trajectory, green area represents the future trajectory distribution. The brighter color, the higher probability.

Real-world Deployment

Real_all.mp4

BibTex

@inproceedings{zhong2023rspt,

  title={RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking},

  author={Zhong, Fangwei and Bi, Xiao and Zhang, Yudi and Zhang, Wei and Wang, Yizhou},

  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},

  year={2023}

}