StreetLearn Code

We have released the code of the StreetLearn environment at: https://github.com/deepmind/streetlearn. That code is written in C++ and Python and enables accessing the Street View-based data and interacting with the environment. The open-source code structure and the dataset are described in this paper.

A separate code release will be made for the TensorFlow-based agent and training using Reinforcement Learning, relying on a modified version of Importance Weighted Actor-Learner Architectures (paper, code).

Top: examples of three RGB image observations in Manhattan, obtained by projecting Google Street View panoramas to a specified yaw and pitch angle, with a given field of view. Bottom: corresponding locations of panoramas in their neighborhood. Each panorama is represented by a dot, and the current field of view is shown a green cone.

The environment code contains:

  • Our C++ StreetLearn engine for loading, caching and serving Google Street View panoramas as well as for handling navigation (moving from one panorama to another) depending on the city street graph and the current position and orientation of the agent. Each panorama is projected from its equirectangular representation to a first-person view for which one can specify the yaw, pitch and field of view angles.
  • The message protocol buffers used to store panoramas and street graph.
  • A Python-based interface for calling the StreetLearn environment with custom action spaces.
    • Within the Python StreetLearn interface, several games are defined in individual files whose names end with _game.py
  • A simple human agent, implemented in Python using pygame, that instantiates the StreetLearn environment on the requested map and enables a user to play the courier or the instruction-following games.
  • Oracle agents, similar to the human agent, which automatically navigate towards a specified goal and reports oracle performance on the courier or instruction-following games.
  • TensorFlow implementation of agents.

The StreetLearn environment follows the specifications from OpenAI Gym.

After instantiating a specific game and the environment, the environment can be initialised by calling function reset(). Note that if the flag auto_reset is set to True at construction, reset() will be called automatically every time that an episode ends. The agent plays within the environment by iteratively producing an action, sending it to (stepping through) the environment, and processing the observations and rewards returned by the environment. The call to function step(action) returns:

  • observation (tuple of observations requested at construction),
  • reward (a float with the current reward of the agent),
  • done (boolean indicating whether the episode has ended),
  • and info (a dictionary of environment state variables, which is useful for debugging the agent behaviour or for accessing privileged environment information for visualisation and analysis).

Actions available to an agent:

  • Rotate left or right in the panorama, by a specified angle (change the yaw of the agent).
  • Rotate up or down in the panorama, by a specified angle (change the pitch of the agent).
  • Move from current panorama A forward to another panorama B if the current bearing of the agent from A to B is within a tolerance angle of 30 degrees.
  • Zoom in and out in the panorama.

For training RL agents, action spaces are discretized using integers. For instance, in our paper, we used 5 actions: (move forward, turn left by 22.5 deg, turn left by 67.5 deg, turn right by 22.5 deg, turn right by 67.5 deg).

The following observations can currently be requested from the environment:

  • view_image: RGB image for the first-person view image returned from the environment and seen by the agent,
  • graph_image: RGB image for the top-down street graph image, usually not seen by the agent,
  • yaw: Scalar value of the yaw angle of the agent, in degrees (zero corresponds to North),
  • pitch: Scalar value of the pitch angle of the agent, in degrees (zero corresponds to horizontal),
  • metadata: Message protocol buffer of type Pano with the metadata of the current panorama,
  • target_metadata: Message protocol buffer of type Pano with the metadata of the target/goal panorama,
  • latlng: Tuple of lat/lng scalar values for the current position of the agent,
  • target_latlng: Tuple of lat/lng scalar values for the target/goal position,
  • latlng_label: Integer discretized value of the current agent position using 1024 bins (32 bins for latitude and 32 bins for longitude),
  • target_latlng_label: Integer discretized value of the target position using 1024 bins (32 bins for latitude and 32 bins for longitude),
  • yaw_label: Integer discretized value of the agent yaw using 16 bins,
  • neighbors: Vector of immediate neighbor egocentric traversability grid around the agent, with 16 bins for the directions around the agent and bin 0 corresponding to the traversability straight ahead of the agent.
  • thumbnails: set of n+1 RGB images for the first-person view image returned from the environment, that should be seen by the agent at specific waypoints and goal locations when playing the instruction-following game with n instructions,
  • instructions: set of n instructions for the agent at specific waypoints and goal locations when playing the instruction-following game with n instructions,
  • ground_truth_direction: Scalar value of the relative ground truth direction to be taken by the agent in order to follow a shortest path to the next goal or waypoint. This observation should be requested only for agents trained using imitation learning.

The following games are available in the StreetLearn environment:

  • coin_game: the rewards consist in invisible coins scattered throughout the map, yielding a reward of 1 for each. Once picked up, these rewards do not reappear until the end of the episode.
  • courier_game: the agent is given a goal destination, specified as lat/long pairs. Once the goal is reached (with 100m tolerance), a new goal is sampled, until the end of the episode. Rewards at a goal are proportional to the number of panoramas on the shortest path from the agent's position when it gets the new goal assignment to that goal position. Additional reward shaping consists in early rewards when the agent gets within a range of 200m of the goal. Additional coins can also be scattered throughout the environment. The proportion of coins, the goal radius and the early reward radius are parametrizable.
  • curriculum_courier_game: same as the courier game, but with a curriculum on the difficulty of the task (maximum straight-line distance from the agent's position to the goal when it is assigned).
  • goal_instruction_game and its variations incremental_instruction_game and step_by_step_instruction_game use navigation instructions to direct agents to a goal. Agents are provided with a list of instructions as well as thumbnails that guide the agent from its starting position to the goal location. In step_by_step, agents are provided one instruction and two thumbnails at a time, in the other game variants the whole list is available throughout the whole game. Reward is granted upon reaching the goal location (all variants), as well as when hitting individual waypoints (incremental and step_by_step only). During training various curriculum strategies are available to the agents, and reward shaping can be employed to provide fractional rewards when the agent gets within a range of 50m of a waypoint or goal.

In addition to the environment, we have added two scripts, human_agent and oracle_agent that show an agent that is, respectively, controlled by the user, or moving along the shortest path to the goal. The UI of these scripts displays, on top, the view_image, and on bottom, the graph_image. On the view_image, a navigation bar displays small circles in the directions of the panoramas (green, if travel is possible in that direction; red if not; orange if there are multiple choices of directions). The graph_image shows the node of the graph in white, nodes with coins in yellow, already traversed nodes in red, the shortest path to the goal in violet, and the agent's location and field of view in blue.