Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL

The Collected Multi-Level Data in Complex Room

                                             Noise Level 1

                                             Noise Level 2

                                             Noise Level 3

                           Noise Level 4

Tracking in Complex Room

Tracking in High-fidelity Environment

Tracking Unseen Targets

Real Robot

Follow a Woman in a Dark Parking Lot

Follow a Cat in a Dark Parking Lot

Follow a Man in an Indoor Room with Obstacles (1)

Follow a Man in an Indoor Room with Obstacles (2)

More Training examples