Datasets
San Francisco Landmark Dataset for Mobile Landmark Recognition
Contributors:
- Stanford: D. Chen, S. Tsai, B. Girod
- ETHZ: G. Baatz, K. Koeser, M. Pollefeys
- Nokia Research Center: R. Vedantham, T. Pylvanainen, K. Roimela, R. Grzeszczuk
- Navteq: X. Chen, J. Bach
We present the San Francisco Landmark Dataset, which contains a database of 1.7 million images of buildings in San Francisco with ground truth labels, geotags, and calibration data, as well as a difficult query set of 803 cell phone images taken with a variety of different camera phones. The data is originally acquired by vehicle-mounted cameras with wide-angle lenses capturing spherical panoramic images. For all visible buildings in each panorama, a set of overlapping perspective images is generated. We provide this dataset to facilitate further research in the important area of landmark recognition with mobile devices.
>> Download Dataset <<
References:
- D. Chen, G. Baatz, K. Koeser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk, "City-scale landmark identification on mobile devices", IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2011.
- G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys, "Leveraging 3D city models for rotation invariant place-of-interest recognition", International Journal on Computer Vision, Vol. 94, No. 5, May 2011.
- G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys, "Handling urban location recognition as a 2D homothetic problem", European Conference on Computer Vision (ECCV), September 2010.
Stanford Mobile Visual Search Dataset
Contributors:
- Stanford: V. Chandrasekhar, D. Chen, S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, B. Girod
- Qualcomm: Y. Reznik
- Nokia Research Center: R. Vedantham, R. Grzeszczuk
- Navteq: J. Bach
We propose the Stanford Mobile Visual Search Dataset, which contains camera-phone images of products, CDs, books, outdoor landmarks, business cards, text documents, museum paintings and video clips. The dataset has several key characteristics lacking in existing datasets: rigid objects, widely varying lighting conditions, perspective distortion, foreground and background clutter, realistic ground-truth reference data, and query data collected from heterogeneous low and high-end camera phones. We hope that the dataset will help push research forward in the field of mobile visual search.
>> Download Dataset <<
References:
- V. Chandrasekhar, D. Chen, S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, Y. Reznik, R. Vedantham, R. Grzeszczuk, J. Bach, and B. Girod, "The Stanford mobile visual search dataset", ACM Multimedia Systems Conference (MMSys), February 2011.
Stanford Streaming Mobile Augmented Reality Dataset
Contributors:
- Stanford: M. Makar. S. Tsai, V. Chandrasekhar, D. Chen, B. Girod
We introduce the Stanford Streaming MAR Dataset. The dataset contains 23 different objects of interest, divided to four categories: Books, CD covers, DVD covers and Common Objects. We first record one video for each object where the object is in a static position while the camera is moving. These videos are recorded with a hand-held mobile phone with different amounts of camera motion, glare, blur, zoom, rotation and perspective changes. Each video is 100 frames long, recorded at 30 fps with resolution 640 x 480. For each video, we provide a clean database image (no background noise) for the corresponding object of interest.
We also provide 5 more videos for moving objects recorded with a moving camera. These videos help to study the effect of background clutter when there is a relative motion between the object and the background. Finally, we record 4 videos that contain multiple objects from the dataset. Each video is 200 frames long and contains 3 objects of interest where the camera captures them one after the other.
We provide the ground-truth localization information for 14 videos, where we manually define a bounding quadrilateral around the object of interest in each video frame. This localization information is used in the calculation of the Jaccard index.
1. Static single object:
1.a. Books: Automata Theory, Computer Architecture, OpenCV, Wang Book.
1.b. CD Covers: Barry White, Chris Brown, Janet Jackson, Rascal Flatts, Sheryl Crow.
1.c. DVD Covers: Finding Nemo, Monsters Inc, Mummy Returns, Private Ryan, Rush Hour, Shrek, Titanic, Toy Story.
1.d. Common Objects: Bleach, Glade, Oreo, Polish, Tide, Tuna.
2. Moving object, moving camera:
Barry White Moving, Chris Brown Moving, Titanic Moving, Titanic Moving - Second, Toy Story Moving.
3. Multiple objects:
3.a. Multiple Objects 1: Polish, Wang Book, Monsters Inc.
3.b. Multiple Objects 2: OpenCV, Barry White, Titanic.
3.c. Multiple Objects 3: Monsters Inc, Toy Story, Titanic.
3.d. Multiple Objects 4: Wang Book, Barry White, OpenCV.
>> Download Dataset <<
References:
- M. Makar, V. Chandrasekhar, S. Tsai, D. Chen, and B. Girod, "Interframe coding of feature descriptors for mobile augmented reality", IEEE Transactions on Image Processing. Vol. 23, No. 8, August 2014.
- M. Makar, S. Tsai, V. Chandrasekhar, D. Chen and B. Girod, "Interframe Coding of Canonical Patches for Low Bit-Rate Mobile Augmented Reality," Special Issue of the International Journal of Semantic Computing, vol. 7, no. 1, pp. 5-24, March 2013.
- M. Makar, S. Tsai, V. Chandrasekhar, D. Chen, and B. Girod, "Interframe coding of canonical patches for mobile augmented reality", IEEE International Symposium on Multimedia (ISM), December 2012.
- D. Chen, M. Makar, A. Araujo, and B. Girod, "Interframe coding of global image signatures for mobile augmented reality", IEEE Data Compression Conference (DCC), March 2014.
Stanford Image-to-Video (I2V) Dataset
Contributors:
- Stanford: A. Araujo, J. Chaves, D. Chen, R. Angst, B. Girod
Stanford I2V is a new large-scale dataset for the evaluation of query-by-image video search. It contains 3,801 hours of news videos and 229 queries with annotated ground-truth sequences.
>> Download Dataset <<
References:
- A. Araujo, J. Chaves, D. Chen, R. Angst and B. Girod. "Stanford I2V: A News Video Dataset for Query-by-Image Experiments", in Proc. ACM Multimedia Systems, 2015.
Compact Descriptors for Visual Search Patches Dataset
Contributors:
- Institute for Infocomm Research: V. Chandrasekhar
- Microsoft: G. Takacs
- Stanford: D. Chen, S. Tsai, B. Girod
- Qualcomm: M. Makar
MPEG is currently developing a standard titled Compact Descriptors for Visual Search (CDVS) for descriptor extraction and compression. In this work, we develop comprehensive patch-level experiments for a direct comparison of low bitrate descriptors for visual search. For evaluating different compression schemes, we propose a dataset of matching pairs of image patches from the MPEG-CDVS image-level datasets.
>> Download Dataset <<
References:
- V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, M. Makar, and B. Girod, "Feature matching performance of compact descriptors for visual search", IEEE Data Compression Conference (DCC), March 2014.