Wild Visual Navigation

Jonas Frey, Matias Mattamala, Libera Piotr, Nived Chebrolu, Cesar Cadena, Georg Martius, Maurice Fallon, Marco Hutter

(*equal contribution, jonfrey@ethz.ch, matias@robots.ox.ac.uk)

Robotics: Science and Systems 2023 - Autonomous Robots 2024

Wild Visual Navigation (WVN) learns to predict traversability from images via online self-supervised learning. Starting from a randomly initialized traversability estimation network without prior assumptions about the environment (a), a human operator drives the robot around areas that are traversable for the given platform (b). After a few minutes of operation, WVN learns to distinguish between traversable and untraversable areas (c), enabling the robot to navigate autonomously and safely within the environment (d).

Fast Traversability Estimation for Wild Visual Navigation  [Arxiv]

@INPROCEEDINGS{frey23fast, 

  AUTHOR    = {Jonas Frey AND Matias Mattamala AND Nived Chebrolu AND Cesar Cadena AND Maurice Fallon AND Marco Hutter}, 

  TITLE     = {{Fast Traversability Estimation for Wild Visual Navigation}}, 

  BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 

  YEAR      = {2023}, 

  ADDRESS   = {Daegu, Republic of Korea}, 

  MONTH     = {July}, 

  DOI       = {10.15607/RSS.2023.XIX.054} 

Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision [Arxiv]

@INPROCEEDINGS{mattamala24wild, 

  AUTHOR    = {Jonas Frey AND Matias Mattamala AND Libera Piotr AND Nived Chebrolu AND Cesar Cadena AND Georg Martius AND Marco Hutter AND Maurice Fallon}, 

  TITLE     = {{Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision}}, 

  BOOKTITLE = {under review for Autonomous Robots}, 

  YEAR      = {2024}

Videos

Quick Overview Presentation at RSS2023

Supplementary Video 1

Supplementary Video 2

System Overview

System overview: WVN only requires monocular RGB images, odometry, and proprioceptive data as input, which are processed to extract features and supervision signals used for online learning and inference of traversability.

Feature Extraction & Inference process: The camera scheduler module selects one camera from the available pool, and provides the RGB image to the feature extractor module. This extracts dense visual features F using pre-trained models. Next, the sub-sample module produces a reduced set of embeddings {fn} using a subsampling strategy based on a weak segmentation system. Lastly, the inference module predicts traversability from the image using the embeddings.

Supervision and mission graphs: (a) Information stored in each graph over the mission. While the Supervision Graph only stores temporary information about the robot’s footprint in a sliding window, the Mission Graph saves the data required for online learning over the full mission. The color of the footprint patches indicates the generated traversability score. (b) The interaction between graphs updates the traversability in the mission nodes by reprojecting the robot’s footprint and traversability scores.

Preview Experiments

Deployment with multi-camera setup. Left: Real scene and visual traversability prediction. Center: Visual traversability projected on the local terrain map. Right: Geometric traversability computed from elevation map. The robot was teleoperated throughout the experiment.

Kilometer-scale navigation: We deployed our system to learn to segment the footpath of a park after training for a few steps. We executed 3 runs starting from different points in the park: run 1 (0.55 km), run 2 (0.5 km), and run 3 (1.4 km). Minor interventions were applied to guide the robot in intersections; major interventions (⋆) were required for some areas when the robot miss-classified muddy patches for the path.

Feature Sub-sampling: We tested the different subsampling methods in the recorded path-following sequence from Sec. 5.2.4. We observed that STEGO provides significant improvements for the path-following task in both traversability prediction fidelity and training stability

Adaptation on real hardware: We tested the online adaptation capabilities of our system by teleoperating the robot to complete 3 loops in a park (top, route shown in ■). The columns show different parts of the loop (a,b,c); each row displays the improvement of the traversability estimate over time and training steps.

Aerial views of the 3 environments used for offline testing of our system, illustrating the paths used for data collection and scene examples. The purple trajectories are used for training and the remaining for validation

Inference approaches: We qualitatively compared segment-wise and pixel-wise inference using pre-trained DINO and STEGO features. We observed advantages in executing the inference in a pixel-wise manner, which provided a fine-grained prediction regardless of the pretrained features.

Point-to-point autonomous navigation: (a) After teleoperating the robot for 2 min (path shown in ■), we successfully achieved autonomous navigation in a woodland environment (path shown in ■). (b) Some of the SDFs generated from the predicted traversability during autonomous operation. (c) Global 2.5D reconstruction of the testing area and predicted traversability, generated in post-processing to illustrate the capabilities of our approach

Visual vs geometric traversability: Illustration of traversability map (bottom row) and corresponding SDF (top row) for three different traversability estimation methods applied to the same terrain patch. Our visual traversability estimate provides clear advantages for local planning compared to geometric methods, where the latter get heavily affected by traversable high grass or branches (bottom row). This is evident when comparing the SDF’s, where geometry-based methods are more sensitive to the spikes produced by high grass areas (top row).