V-STRONG: Visual Traversability Learning for Off-road Navigation via Self-Supervision

ICRA 2024

Sanghun Jung, Joonho Lee, Xiangyun Meng, Byron Boots, and Alexander Lambert

University of Washington



Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability prediction, leveraging a state-of-the-art vision foundation model for improved out-of-distribution performance. Our method employs contrastive representation learning using both human driving data and instance-based segmentation mask during training. We show that this simple, yet effective, technique drastically outperforms recent methods in predicting traversability for both on- and off-trail driving scenarios. We compare our method with recent baselines on both a common benchmark as well as our own collected datasets, covering a diverse range of outdoor environments and varied terrain types. We also demonstrate the compatibility of resulting costmap predictions with a model-predictive controller. Finally, we evaluate our approach on zero- and few-shot tasks, demonstrating unprecedented performance for generalization to new environments.


We first incorporate stereo-depth information to filter out occluded trajectory points and then project the trajectory into image space. Then, positive and negative points are sampled based on the trajectory and SAM-predicted mask information. We apply a pre-trained image encoder along with the traversability decoder that outputs traversability features. Afterward, we extract positive and negative features and apply the trajectory-/mask-based contrastive losses to train the decoder. Additionally, we update our traversability prototype vector using a running-average over positive features. This updated prototype vector is used to calculate the similarity at test time, which will be directly translated into traversability costs. Note that a dashed gray arrow denotes gradient stop, and we do not update our encoder during training.

Qualitative Results

