We are currently finalizing the code of the StreetLearn environment. That code is written in Python and enables accessing to the Street View-based data and interacting with the environment. The code will be available at the beginning of January 2019 at: https://github.com/deepmind/streetlearn.
A separate code release will be made for the TensorFlow-based agent and training using Reinforcement Learning, relying on a modified version of Importance Weighted Actor-Learner Architectures (paper, code).
Top: examples of three RGB image observations in Manhattan, obtained by projecting Google Street View panoramas to a specified yaw and pitch angle, with a given field of view. Bottom: corresponding locations of panoramas in their neighborhood. Each panorama is represented by a dot, and the current field of view is shown a green cone.
The environment code contains:
- Our C++ StreetLearn engine for loading, caching and serving Google Street View panoramas by projecting them from a equirectangular representation to first-person projected view at a given yaw, pitch and field of view, and for handling navigation (moving from one panorama to another) depending on the city street graph and the current orientation.
- The message protocol buffers used to store panoramas and street graph.
- A Python-based interface for calling the StreetLearn environment with custom action spaces.
- A simple human agent, implemented in Python using pygame, that instantiates the StreetLearn environment on the requested map and enables a user to play the courier game.
- An oracle agent, similar to the human agent, which automatically navigates towards the goal and reports oracle performance on the courier game.
The StreetLearn environment follows the specifications from OpenAI Gym. The call to function step(action) returns:
- observation (tuple of observations requested at construction),
- reward (a float with the current reward of the agent),
- done (boolean indicating whether the episode has ended)
- and info (a dictionary of environment state variables).
After creating the environment, it is initialised by calling function reset(). If the flag auto_reset is set to True at construction, reset() will be called automatically every time that an episode ends.
Actions available to an agent:
- Rotate left or right in the panorama, by a specified angle (change the yaw of the agent).
- Rotate up or down in the panorama, by a specified angle (change the pitch of the agent).
- Move from current panorama A forward to another panorama B if the current bearing of the agent from A to B is within a tolerance angle of 30 degrees.
- Zoom in and out in the panorama.
For training RL agents, action spaces are discretized using integers. For instance, in our paper, we used 5 actions: (move forward, turn left by 22.5 deg, turn left by 67.5 deg, turn right by 22.5 deg, turn right by 67.5 deg).
The following observations can currently be requested from the environment:
- view_image: RGB image for the first-person view image returned from the environment and seen by the agent,
- graph_image: RGB image for the top-down street graph image, usually not seen by the agent,
- yaw: Scalar value of the yaw angle of the agent, in degrees (zero corresponds to North),
- pitch: Scalar value of the pitch angle of the agent, in degrees (zero corresponds to horizontal),
- metadata: Message protocol buffer of type Pano with the metadata of the current panorama,
- target_metadata: Message protocol buffer of type Pano with the metadata of the target/goal panorama,
- latlng: Tuple of lat/lng scalar values for the current position of the agent,
- target_latlng: Tuple of lat/lng scalar values for the target/goal position,
- yaw_label: Integer discretized value of the agent yaw using 16 bins,
- neighbors: Vector of immediate neighbor egocentric traversability grid around the agent, with 16 bins for the directions around the agent and bin 0 corresponding to the traversability straight ahead of the agent.
The following games are available in the StreetLearn environment:
- coin_game: the rewards consist in invisible coins scattered throughout the map, yielding a reward of 1 for each. Once picked up, these rewards do not reappear until the end of the episode.
- courier_game: the agent is given a goal destination, specified as lat/long pairs. Once the goal is reached (with 100m tolerance), a new goal is sampled, until the end of the episode. Rewards at a goal are proportional to the number of panoramas on the shortest path from the agent's position when it gets the new goal assignment to that goal position. Additional reward shaping consists in early rewards when the agent gets within a range of 200m of the goal. Additional coins can also be scattered throughout the environment. The proportion of coins, the goal radius and the early reward radius are parametrizable.
- curriculum_courier_game: same as the courier game, but with a curriculum on the difficulty of the task (maximum straight-line distance from the agent's position to the goal when it is assigned).
In addition to the environment, we have added two scripts, human_agent and oracle_agent that show an agent that is, respectively, controlled by the user, or moving along the shortest path to the goal. The UI of these scripts displays, on top, the view_image, and on bottom, the graph_image. On the view_image, a navigation bar displays small circles in the directions of the panoramas (green, if travel is possible in that direction; red if not; orange if there are multiple choices of directions). The graph_image shows the node of the graph in white, nodes with coins in yellow, already traversed nodes in red, the shortest path to the goal in violet, and the agent's location and field of view in blue.