Part VI - Output Embedding for Large-Scale Visual Recognition

Speaker: Florent Perronnin

The focus of the computer vision community has long been on input embedding: how to transform an image into a suitable descriptor which can be subsequently used as input to simple classifiers such as linear SVMs? In this part, we will consider the problem of output embedding: how to embed classes in a Euclidean space? We will show that such an embedding is a must for large-scale visual recognition as it enables parameter sharing: this yields classifiers which are more accurate when training data is scarce (including zero-shot recognition) and which are faster to train and evaluate. We will provide a taxonomy of output embeddings: data-independent embeddings (e.g. [HKL09]), embeddings based on a priori information (e.g. [LEB08,APH13]) or learned embeddings (e.g. [WBU10]). We will also explain how to measure the compatibility between input embeddings and output embeddings using techniques such as least-squared regression or structured learning [TJH05]. Finally, we will show recent successful applications of output embedding to web-scale visual classification (e.g. [WBU10,FCS13]).

[APH13] Z. Akata, F. Perronnin, Z. Harchaoui and C. Schmid, “Label-Embedding for Attribute-Based Classification”, CVPR, 2013

[FCS13] A. Frome, G. Corrado, J. Shlens, S. Bengio, J. Dean, M.’A. Ranzato, R. Mikolov, “DeViSE: A Deep Visual-Semantic Embedding Model”, NIPS, 2013.

[HKL09] D. Hsu, S. Kakade, J. Langford and T. Zhang, “Multi-label prediction via compressed sensing”, NIPS, 2009.

[LEB08] H. Larochelle, D. Erhan, and Y. Bengio, “Zero-data learning of new tasks”, AAAI, 2008

[TJH05] I. Tsochantaridis, T. Joachims, T. Hofmann and Y. Altun, “Large margin methods for structured and interdependent output variables”, JMLR, 2005.

[WBU10] J. Weston, S. Bengio and N. Usunier, "Large-scale image annotation: learning to rank with joint word-image embeddings", ECML, 2010.