ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints

UC Berkeley

arXiv | Summary Video | Talk @ RSS | Media Coverage | BibTeX

Best Systems Paper Finalist

Long Talk at Robotics: Science and Systems (RSS) 2022

New York City, USA

Robotic navigation has been approached as a problem of 3D reconstruction and planning, as well as an end-to-end learning problem. However, long-range navigation requires both planning and reasoning about local traversability, as well as being able to utilize general knowledge about global geography, in the form of a roadmap, GPS, or other side information providing important cues. In this work, we propose an approach that integrates learning and planning, and can utilize side information such as schematic roadmaps, satellite maps and GPS coordinates as a planning heuristic, without relying on them being accurate. Our method, ViKiNG, incorporates a local traversability model, which looks at the robot's current camera observation and a potential subgoal to infer how easily that subgoal can be reached, as well as a heuristic model, which looks at overhead maps for hints and attempts to evaluate the appropriateness of these subgoals in order to reach the goal. These models are used by a heuristic planner to identify the best waypoint in order to reach the final destination. Our method performs no explicit geometric reconstruction, utilizing only a topological representation of the environment. Despite having never seen trajectories longer than 80 meters in its training dataset, ViKiNG can leverage its image-based learned controller and goal-directed heuristic to navigate to goals up to 3 kilometers away in previously unseen environments, and exhibit complex behaviors such as probing potential paths and backtracking when they are found to be non-viable. ViKiNG is also robust to unreliable maps and GPS, since the low-level controller ultimately makes decisions based on egocentric image observations, using maps only as planning heuristics.

Method

1. Start with a diverse, offline dataset of trajectories

This can contain navigation trajectories from a wide range of environments, collected using (unknown) scripted policies, human teleoperation or as simple as a random walk.

2. Train a latent goal model to reason about traversability and propose feasible subgoals

Our low-level controller maps the current image observation o_t and a waypoint observation o_w to: (1) the temporal distance to reach w; (2) the best action that the robot must take; (3) a prediction of the (approximate) offset in GPS readings between current and w. (1) and (3) will be used by a higher-level planner, and (2) will be used as a learned policy, if needed. We would also like this model to be able to propose potential subgoals w (in a latent space) that are reachable from the current observation o_t.

3. Learn a goal-directed heuristic from the geographic hints

We train a heuristic h_over to score the favorability of candidate waypoints proposed by the latent goal model. Our heuristic is based on the estimator p_over for the probability that the chosen waypoint lies on a valid path to the goal G. We approximate this heuristic using a contrastive learning objective.

4. Use the learned heuristic to perform informed search in novel environments

Putting it all together, ViKiNG performs physical search in a previously unseen environment by incrementally building a topological graph of its environment. We use a latent goal model to propose candidate waypoints by sampling a learned prior, and combine that a learned heuristic function that uses the overhead map as geographic context to pick the best next waypoint using ViKiNG-A*: an A*-like algorithm for performing physical search.

Kilometer-Scale Navigation with ViKiNG

Experiment 1

Total Distance: 1.22km
Hints: Satellite

[click here for a higher resolution video]

Experiment 2

Total Distance: 782m
Hints: Roadmap

[click here for a higher resolution video]

Experiment 3

5 Checkpoints
Total Distance: 2.65km
Hints: Satellite

[click here for a higher resolution video]

The Role of Geographic Hints

What do the hints teach ViKiNG: satellite images v/s schematic roadmaps?

For a fixed start-goal location pair, we compare the learned behaviors of ViKiNG with (i) roadmap hints, and (ii) satellite image hints. Providing roadmap hints encourages the robot to follow marked roads, while satellite image hints discover a more direct path by cutting across a meadow.

[click here for a higher resolution video]

What happens if we disable the hints?

For a fixed start-goal location pair, we visualize the behavior of ViKiNG in the absence of available geographic hints (in the form of satellite images). As a baseline, we also run GCG: a competitive visual navigation baseline.

On disabling the overhead hints (and only using a GPS-based heuristic), ViKiNG-NoSat can still reach the destination, but takes significantly longer to do so, greedily exploring the environment and driving very close to the obstacle (building). That said, this experiment also illustrates the ability of the underlying ViKiNG-A* to handle less useful heuristics: while the path is significantly longer, the method is still able to eventually reach the destination, and in some sense the mistakes the method makes are to be expected of any system that has no prior map information. GCG, which also has access to GPS but not the overhead map, fails to exhibit such behavior and is unable to find a path around the building.

[click here for a higher resolution video]

BibTeX

@inproceedings{shah2022viking,

author = {Dhruv Shah and Sergey Levine},

title = {{ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints}},

booktitle = {Proceedings of Robotics: Science and Systems},

year = {2022},

url = {http://www.roboticsproceedings.org/rss18/p019.html}

}