StreetLearn Dataset

We are releasing a dataset of 113k Google Street View panoramas and street connectivity graph covering two cities:

  • Manhattan (south of 81st Street): 55k images, within an lat/lng bounding box defined by (40.695, -74.028) and (40.788, -73.940). Note that Brooklyn, Queens, Roosevelt Island as well as the bridges and tunnels out of Manhattan are excluded, and we include only panoramas inside a polygon that follows the waterfront of Manhattan and 79th / 81st Street, covering an area of 31.6km2.
  • Pittsburgh: 58k images, within a lat/lng bounding box defined by (40.425, -80.035) and (40.460, -79.930), covering an area of 8.9km by 3.9km or 34.3km2.

As in Google Street View, panoramas are spaced about every 10m, with denser spacing at intersections.

Top: area of Pittsburgh covered in the dataset.

Left: area of Manhattan covered in the StreetLearn dataset.

The equirectangular panorama RGB images are stored in compressed high quality JPEG format, and come in two sizes:

  • 1664 x 832
  • 416 x 208

Because of the large storage size of the uncompressed panorama images, storing more than a few thousands such panoramas in memory may be impossible, and panoramas may need to be reloaded. The StreetLearn engine implements a caching mechanism that pre-fetches panoramas in the vicinity of the current position of the agent. However, in tasks where the agents needs to traverse long distances through the environment, the caching mechanism does not prevent frequent disk access.

For this reason, and when training agents that process small-sized 84x84 image observations (such as the ones used in our paper on "Learning to Navigate in Cities Without A Map"), we introduced a down-sampled version of the panoramas that potentially enables to hold several tens of thousands of panoramas in memory.

Example of equirectangular panorama image taken in Manhattan, stored at a resolution of 1664 x 832.

The code of the StreetLearn environment is built to interface with this dataset. The dataset itself is stored as LevelDB files using the Protocol Buffer format.

Each panorama is stored as a protocol buffer with a string in high-quality compressed JPEG format that encodes the equirectangular image, decorated with the following attributes:

  • a unique string ID,
  • lat/long coordinates,
  • altitude,
  • pitch, roll and yaw angles of the panoramic camera,
  • date of acquisition,
  • and a list of directly connected neighbors.

The usage and distribution of this dataset is subject to a license agreement between Google and the researcher's institution. Please fill the following form to obtain and sign the license agreement, and we will share with you a temporary link to download the dataset.