V-STRONG: Visual Traversability Learning for Off-road Navigation via Self-Supervision

ICRA 2024

Sanghun Jung, Joonho Lee, Xiangyun Meng, Byron Boots, and Alexander Lambert

University of Washington

[Paper]

Abstract

Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability prediction, leveraging a state-of-the-art vision foundation model for improved out-of-distribution performance. Our method employs contrastive representation learning using both human driving data and instance-based segmentation mask during training. We show that this simple, yet effective, technique drastically outperforms recent methods in predicting traversability for both on- and off-trail driving scenarios. We compare our method with recent baselines on both a common benchmark as well as our own collected datasets, covering a diverse range of outdoor environments and varied terrain types. We also demonstrate the compatibility of resulting costmap predictions with a model-predictive controller. Finally, we evaluate our approach on zero- and few-shot tasks, demonstrating unprecedented performance for generalization to new environments.

Method

We first incorporate stereo-depth information to filter out occluded trajectory points and then project the trajectory into image space. Then, positive and negative points are sampled based on the trajectory and SAM-predicted mask information. We apply a pre-trained image encoder along with the traversability decoder that outputs traversability features. Afterward, we extract positive and negative features and apply the trajectory-/mask-based contrastive losses to train the decoder. Additionally, we update our traversability prototype vector using a running-average over positive features. This updated prototype vector is used to calculate the similarity at test time, which will be directly translated into traversability costs. Note that a dashed gray arrow denotes gradient stop, and we do not update our encoder during training.

Qualitative Results

Video

ICRA_video_final_highres.mp4