San Francisco Landmark Dataset for Mobile Landmark Recognition

  • Stanford: D. Chen, S. Tsai, B. Girod
  • ETHZ: G. Baatz, K. Koeser, M. Pollefeys
  • Nokia Research Center: R. Vedantham, T. Pylvanainen, K. Roimela, R. Grzeszczuk
  • Navteq: X. Chen, J. Bach

We present the San Francisco Landmark Dataset, which contains a database of 1.7 million images of buildings in San Francisco with ground truth labels, geotags, and calibration data, as well as a difficult query set of 803 cell phone images taken with a variety of different camera phones. The data is originally acquired by vehicle-mounted cameras with wide-angle lenses capturing spherical panoramic images. For all visible buildings in each panorama, a set of overlapping perspective images is generated. We provide this dataset to facilitate further research in the important area of landmark recognition with mobile devices. 

  1. D. Chen, G. Baatz, K. Koeser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk, "City-scale landmark identification on mobile devices", IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2011.
  2. G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys, "Leveraging 3D city models for rotation invariant place-of-interest recognition", International Journal on Computer Vision, Vol. 94, No. 5, May 2011.
  3. G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys, "Handling urban location recognition as a 2D homothetic problem", European Conference on Computer Vision (ECCV), September 2010.

Stanford Mobile Visual Search Dataset

  • Stanford: V. Chandrasekhar, D. Chen, S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, B. Girod
  • Qualcomm: Y. Reznik
  • Nokia Research Center: R. Vedantham, R. Grzeszczuk
  • Navteq: J. Bach

We propose the Stanford Mobile Visual Search Dataset, which contains camera-phone images of products, CDs, books, outdoor landmarks, business cards, text documents, museum paintings and video clips. The dataset has several key characteristics lacking in existing datasets: rigid objects, widely varying lighting conditions, perspective distortion, foreground and background clutter, realistic ground-truth reference data, and query data collected from heterogeneous low and high-end camera phones. We hope that the dataset will help push research forward in the field of mobile visual search. 

  1. V. Chandrasekhar, D. Chen, S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, Y. Reznik, R. Vedantham, R. Grzeszczuk, J. Bach, and B. Girod, "The Stanford mobile visual search dataset", ACM Multimedia Systems Conference (MMSys), February 2011.

Stanford Streaming Mobile Augmented Reality Dataset

  • Stanford: M. Makar. S. Tsai, V. Chandrasekhar, D. Chen, B. Girod

We introduce the Stanford Streaming MAR Dataset. The dataset contains 23 different objects of interest, divided to four categories: Books, CD covers, DVD covers and Common Objects. We first record one video for each object where the object is in a static position while the camera is moving. These videos are recorded with a hand-held mobile phone with different amounts of camera motion, glare, blur, zoom, rotation and perspective changes. Each video is 100 frames long, recorded at 30 fps with resolution 640 x 480. For each video, we provide a clean database image (no background noise) for the corresponding object of interest. 

We also provide 5 more videos for moving objects recorded with a moving camera. These videos help to study the effect of background clutter when there is a relative motion between the object and the background. Finally, we record 4 videos that contain multiple objects from the dataset. Each video is 200 frames long and contains 3 objects of interest where the camera captures them one after the other. 

We provide the ground-truth localization information for 14 videos, where we manually define a bounding quadrilateral around the object of interest in each video frame. This localization information is used in the calculation of the Jaccard index. 

1. Static single object:
1.a. Books: Automata Theory, Computer Architecture, OpenCV, Wang Book.
1.b. CD Covers: Barry White, Chris Brown, Janet Jackson, Rascal Flatts, Sheryl Crow.
1.c. DVD Covers: Finding Nemo, Monsters Inc, Mummy Returns, Private Ryan, Rush Hour, Shrek, Titanic, Toy Story.
1.d. Common Objects: Bleach, Glade, Oreo, Polish, Tide, Tuna. 

2. Moving object, moving camera:
Barry White Moving, Chris Brown Moving, Titanic Moving, Titanic Moving - Second, Toy Story Moving. 

3. Multiple objects:
3.a. Multiple Objects 1: Polish, Wang Book, Monsters Inc.
3.b. Multiple Objects 2: OpenCV, Barry White, Titanic.
3.c. Multiple Objects 3: Monsters Inc, Toy Story, Titanic.
3.d. Multiple Objects 4: Wang Book, Barry White, OpenCV. 

  1. M. Makar, V. Chandrasekhar, S. Tsai, D. Chen, and B. Girod, "Interframe coding of feature descriptors for mobile augmented reality", IEEE Transactions on Image Processing. Vol. 23, No. 8, August 2014.
  2. M. Makar, S. Tsai, V. Chandrasekhar, D. Chen and B. Girod, "Interframe Coding of Canonical Patches for Low Bit-Rate Mobile Augmented Reality," Special Issue of the International Journal of Semantic Computing, vol. 7, no. 1, pp. 5-24, March 2013.
  3. M. Makar, S. Tsai, V. Chandrasekhar, D. Chen, and B. Girod, "Interframe coding of canonical patches for mobile augmented reality", IEEE International Symposium on Multimedia (ISM), December 2012.
  4. D. Chen, M. Makar, A. Araujo, and B. Girod, "Interframe coding of global image signatures for mobile augmented reality", IEEE Data Compression Conference (DCC), March 2014.

Stanford Image-to-Video (I2V) Dataset

  • Stanford: A. Araujo, J. Chaves, D. Chen, R. Angst, B. Girod
Stanford I2V is a new large-scale dataset for the evaluation of query-by-image video search. It contains 3,801 hours of news videos and 229 queries with annotated ground-truth sequences.

  1. A. Araujo, J. Chaves, D. Chen, R. Angst and B. Girod. "Stanford I2V: A News Video Dataset for Query-by-Image Experiments", in Proc. ACM Multimedia Systems, 2015.

Compact Descriptors for Visual Search Patches Dataset

  • Institute for Infocomm Research: V. Chandrasekhar
  • Microsoft: G. Takacs
  • Stanford: D. Chen, S. Tsai, B. Girod
  • Qualcomm: M. Makar

MPEG is currently developing a standard titled Compact Descriptors for Visual Search (CDVS) for descriptor extraction and compression. In this work, we develop comprehensive patch-level experiments for a direct comparison of low bitrate descriptors for visual search. For evaluating different compression schemes, we propose a dataset of matching pairs of image patches from the MPEG-CDVS image-level datasets. 

  1. V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, M. Makar, and B. Girod, "Feature matching performance of compact descriptors for visual search", IEEE Data Compression Conference (DCC), March 2014.