Semi-parametric Topological Memory for Navigation

Nikolay Savinov Alexey Dosovitskiy Vladlen Koltun

ICLR 2018

Our method

General overview and results

We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stores no metric information, only connectivity of locations corresponding to the nodes. We use SPTM as a planning module in a navigation system. Given only 5 minutes of footage of a previously unseen maze, an SPTM-based navigation agent can build a topological map of the environment and use it to confidently navigate towards goals. The average success rate of the SPTM agent in goal-directed navigation across test environments is higher than the best-performing baseline by a factor of three.

Memory module structure

Maze walkthrough example

This is an example of the walkthrough video, shown both to our agent and the baselines before the start of navigation. Our agent constructs the memory graph out of this video, and later uses two deep networks to navigate this graph for reaching the goal.

A3C baseline

This baseline is trained with a popular reinforcement learning (RL) method - asynchronous advantage actor-critic (A3C). The agent is aware of the goal (image on the right), equipped with LSTM memory, and given a maze walkthrough video (not shown). Yet, it fails to generalize well to a previously unseen maze and bumps into the goal only accidentally after aimlessly traversing the environment for a while.

Teach-and-repeat baseline: selected successes

This baseline is simply repeating the actions from an expert walkthrough of the maze. Note that this baseline uses more information than our method: our method only has access to images in the walkthrough video, not actions! Here we show two of a few successful attempts. The overall success rate of teach-and-repeat on this maze is only 36%. The agent is unable to make shortcuts and often gets stuck when it deviates from the walkthrough track.