CVPR 2014 Tutorial on Large-Scale Visual Recognition
Saturday, June 28th - Full day - Grand Ballroom 2
Speakers:
Czech Technical University
Facebook AI Research
Oxford University
Tutorial goals
This tutorial addresses Large-Scale Visual Recognition (LSVR), the problem of understanding visual content (e.g. photos or videos) on a large-scale. This is a topic which has received much attention in the computer vision community in the last few years: as larger datasets have become available [TFF08, DDS09], handling millions of images and thousands of label classes has become the norm rather than the exception [DBL10, WBU10, LRM12, DCM12, JPD12]. Since LSVR is a vast topic, we will mainly focus on two tasks: image retrieval and image classification.
The goals of this tutorial are three-fold:
- Provide the audience with the "tools" to process such large datasets.
- Show the convergence between large-scale retrieval and large-scale classification, two problems which have been traditionally addressed separately.
- Show that LSVR does not necessarily require massive computational resources (although such resources can help, of course...)
The tutorial is complemented with free publicly available software:
- VLFeat: http://www.vlfeat.org/
- INRIA's Fisher vector implementation: http://lear.inrialpes.fr/src/inria_fisher
- VGG's encoding methods evaluation toolkit: http://www.robots.ox.ac.uk/~vgg/software/enceval_toolkit/
- Yael library for exact matching: http://gforge.inria.fr/projects/yael
- PQ-codes toy Matlab implementation: http://people.rennes.inria.fr/Herve.Jegou/projects/ann.html
- J-SGD for large-scale learning: http://lear.inrialpes.fr/src/jsgd/
Schedule
The tutorial will consist of short talks (1h or less) each one covering a specific topic, and each one given by a recognized expert in his field.
morning:
- 8:30am - 8:40am: Introduction (Zaid Harchaoui)
- 8:40am - 9:25am: Part I: Efficient matching (Herve Jegou)
- 9:25am - 10:15am: Part II: Geometry for large-scale retrieval (Ondrej Chum)
- 10:15am - 10:45am: COFFEE BREAK
- 10:45am - 11:50am: Part III: Large-scale machine learning (Zaid Harchaoui)
afternoon:
- 1:30pm - 2:30pm: Part IV: Large-scale visual recognition with deep learning (Marc'Aurelio Ranzato)
- 2:30pm - 3:25pm: Part V: Input embeddings, from shallow to deep (Andrea Vedaldi)
- 3:25pm - 3:55pm: COFFEE BREAK
- 3:55pm - 4:55pm: Part VI: Output embedding for large-scale visual recognition (Florent Perronnin)
Here is a list of references.
Sponsors and financial support
The tutorial is supported by the MSR-INRIA Joint Centre, the "Gargantua" project (CNRS-Mastodons), the "Khronos" project (Labex Persyval-Lab, ANR-11-LABX-0025), and the Fire-ID project (ANR-12-CORD-016).