StreetLearn Dataset

StreetLearnEnvironmentAndDataset.pdf

A limited set of Google Street View images has been approved for use with the StreetLearn project and academic research. We are releasing these Google Street View panoramas (approximately 143,000) and street connectivity graph covering two cities:

Manhattan (south of 81st Street): 56k images, within an lat/lng bounding box defined by (40.695, -74.028) and (40.788, -73.940). Note that Brooklyn, Queens, Roosevelt Island as well as the bridges and tunnels out of Manhattan are excluded, and we include only panoramas inside a polygon that follows the waterfront of Manhattan and 79th / 81st Street, covering an area of 31.6km².
Manhattan data used in the Touchdown and Retouchdown studies (see this page): 29k images, in an area within the above Manhattan dataset. Note that these panoramas were acquired at a different time and only ~700 panoramas overlap between these two sets.
Pittsburgh: 58k images, within a lat/lng bounding box defined by (40.425, -80.035) and (40.460, -79.930), covering an area of 8.9km by 3.9km or 34.3km².

As in Google Street View, panoramas are spaced about every 10m, with denser spacing at intersections.

The dataset - available upon request with a license agreement - is further described in this paper.

Top: area of Pittsburgh covered in the dataset.

Left: area of Manhattan covered in the StreetLearn dataset.

The equirectangular panorama RGB images are stored in compressed high quality JPEG format, and come in two sizes:

1664 x 832
416 x 208

Because of the large storage size of the uncompressed panorama images, storing more than a few thousands such panoramas in memory may be impossible, and panoramas may need to be reloaded. The StreetLearn engine implements a caching mechanism that pre-fetches panoramas in the vicinity of the current position of the agent. However, in tasks where the agents needs to traverse long distances through the environment, the caching mechanism does not prevent frequent disk access.

For this reason, and when training agents that process small-sized 84x84 image observations (such as the ones used in our paper on "Learning to Navigate in Cities Without A Map"), we introduced a down-sampled version of the panoramas that potentially enables to hold several tens of thousands of panoramas in memory.

Example of equirectangular panorama image taken in Manhattan, stored at a resolution of 1664 x 832.

The code of the StreetLearn environment is built to interface with this dataset. The dataset itself is stored as LevelDB files using the Protocol Buffer format.

Each panorama is stored as a protocol buffer with a string in high-quality compressed JPEG format that encodes the equirectangular image, decorated with the following attributes:

a unique string ID,
lat/long coordinates,
altitude,
pitch, roll and yaw angles of the panoramic camera,
date of acquisition,
and a list of directly connected neighbors.

Navigation instructions are stored as text files listing, for each trajectory, navigation steps and goal destination, and are destined to mimic navigation instructions provided by Google Maps. Each step (waypoint) is stored as a position (lat, long coordinates and panorama ID), heading direction, natural language instruction and distance to the next step. Goals are stored as position and expected heading of the agent when it reaches its destination. Using the positions and headings of waypoints and goal we can use the environment to produce thumbnail images of expected views as the agent follows the directions.

The usage and distribution of this dataset is subject to a license agreement between Google and the researcher's institution. Please fill the following form to obtain and sign the license agreement, and we will share with you a temporary link to download the dataset.