Papers

A k-NN Approach for Scalable Image Annotation Using General Web Data. Mauricio Villegas (Universidad Politecnica de Val), Roberto Paredes (Universidad Politecnica de Valencia)

  • Abstract: This paper presents a simple k-NN based image annotation method that relies only on automatically gathered Web data. It can easily change or scale the list of concepts for annotation, without requiring labeled training samples for the new concepts. In terms of MAP the performance is better than the results from the ImageCLEF 2012 Scalable Web Image Annotation Task on the same dataset. Although, in terms of F-measure they are equivalent, suggesting that a better method for choosing how many concepts to select per image is required. Large-scale issues are considered by means of linear hashing techniques. The use of dictionary definitions has been observed to be a useful resource for image annotation without manually labeled training data.
  • URL to the latest version: http://mvillegas.info/pub/Villegas12_BIGVIS_kNN-Annotation.pdf

Adaptive representations of scenes based on ICA mixture model. Wooyoung Lee (Carnegie Mellon University), Michael Lewicki (Case Western Reserve University)

  • Abstract: To develop an adaptive representation based on rich statistical distributions of very large databases of scene images, we train a mixture model based on independent component analysis for full color scene images. The learned features of the model result in the improved scene category classification performance when compared with previous methods. Furthermore, the unsupervised classification of scene images performed by the model suggests that perceptual categories of scene images are to some extent based on the statistics of natural scenes. Our results show that features tailored for subgroups of data can be beneficial for more efficient repre- sentation for a large number of images.
  • Workshop Paper: pdf

Aggregating descriptors with local Gaussian metrics. Hideki Nakayama (The University of Tokyo)

  • Abstract: Recently, large-scale image classification has made a remarkable progress because of the significant advancement in the representation of image features. To realize scalable systems that can handle millions of training samples and tens of thousands of categories, it is crucially important to develop discriminative image signatures that are compatible to linear classifiers. One of the promising approaches to realize this is to encode high-level statistics of local features. Many state-ofthe-art large-scale systems are following this approach and have made remarkable progress over the past few years. However, while first-order statistics are frequently used in many methods, the power of higher-order statistics has not received much attention. In this work, we propose an efficient method to exploit the second-order statistics of local features. For each visual word, the local features of training samples are modeled with a Gaussian, and descriptors from two images are compared using a Fisher vector with respect to the Gaussian. In experiments, we show the promising performance of our method.
  • Workshop Paper: pdf

Beyond Classification -- Large-scale Gaussian Process Inference and Uncertainty Prediction. Alexander Freytag (Friedrich Schiller University ), Erik Rodner (UC Berkeley,University of Jena), Paul Bodesheim (Computer Vision Group, University of Jena), Joachim Denzler (Computer Vision Group, University of Jena)

  • Abstract: Due to the massive (labeled) data available on the web, a tremendous interest in large-scale machine learning methods has emerged in the last years. Whereas, most of the work done in this new area of research focused on fast and efficient classification algorithms, we show in this paper how other aspects of learning can also be covered using massive datasets. The paper briefly presents techniques allowing for utilizing the full posterior obtained from Gaussian process regression (predictive mean and variance) with tens of thousands of data points and without relying on sparse approximation approaches. Experiments are done for active learning and one-class classification showing the benefits in large-scale settings.
  • Workshop Paper: pdf

Classifier-as-a-Service: Online Query of Cascades and Operating Points. Brandyn White (University of Maryland: Colleg), Andrew Miller (University of Central Florida), Larry Davis (University of Maryland: College Park)

  • Abstract: We introduce a classifier and parameter selection algorithm for Classifier-as-a-Service applications where there are many components (e.g., features, kernels, classifiers) available to construct classification algorithms. Queries specify varying requirements (i.e., quality and execution time), some of which may require combining classification algorithms to satisfy; each query may have a different set of quality and execution time requirements (e.g., fast and precise, slow and thorough) and the set of images to which the classifier is to be applied may be small (e.g., even a single image), necessitating a query resolution method that takes negligible time in comparison. When operating on large datasets, meeting design requirements automatically becomes essential to reducing costs associated with unnecessary computation and expert assistance. As queries specify requirements and not implementation details, additional components can be utilized naturally. Our query resolution method combines classifiers with complementary operating points (e.g., high recall algorithmic filter, followed by high precision human verification) in a rejection-chain configuration. Experiments are conducted on the SUN397[1] dataset; we achieve state-of-the-art classification results and 1 m.s. query resolution times.
  • URL to the latest version: http://bw-school.s3.amazonaws.com/nips2012-bigvision-classifier-as-a-service.pdf

Creating a Big Data Resource from the Faces of Wikipedia. Md. Kamrul Hasan (Ecole Polytechnique Montreal), Christopher Pal (Ecole Polytechnique de Montreal)

  • Abstract: We present the Faces of Wikipedia data set in which we have used Wikipedia to create a large database of identities and faces. To automatically extract faces for over 50,000 identities we have developed a state of the art face extraction pipeline and a novel facial co-reference technique. Our approach is based on graphical models and uses the text of Wikipedia pages, face attributes and similarities, as well as clues from various other sources. Our method resolves the name-face association problem jointly for all detected faces on a Wikipedia page. We provide this dataset to the community for further research in various forms including: manually labeled faces, automatically labeled faces using our co-reference technique, raw and processed faces as well as text and meta data features for further evaluations of extraction and co-reference methods.
  • URL to the latest version: http://www.professeurs.polymtl.ca/christopher.pal/BigVision12/

Large-scale image classification with lifted coordinate descent. Zaid Harchaoui (INRIA), Matthijs Douze (INRIA), Mattis Paulin (INRIA), Miro Dudik (Microsoft Research), Jerome Malick (CNRS)

  • Abstract: With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multi-class classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for large-scale multi-class image classification, using the trace-norm-type regularization penalties. Reframing the challenging non-smooth optimization problem into a surrogate infinite-dimensional optimization problem with a regular $\ell_1$--regularization penalty, we propose a simple and provably efficient ``lifted'' coordinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000s-dimensional features, and 100s of categories. Promising experimental results on the subsets of ImageNet are presented.
  • Workshop Paper: pdf

Learning from Incomplete Image Tags. Minmin Chen (Washington university), Kilian Weinberger, Alice Zheng

  • Abstract: Obtaining high-quality training labels for learning can be an onerous task. In this paper, we look at the task of automatic image annotation, trained with only partial supervision. We propose MARCO, a novel algorithm that learns to predict the complete tag set of an image with the help of an auxiliary task that recovers the semantic relationship between tags. We formulate this as a convex programming problem and present an efficient optimization routine that iterates between two closed-form solution steps. We demonstrate on two real datasets that our approach out performs all competitors, especially with very sparsely labeled training images.
  • URL to the latest version: www.cse.wustl.edu/~mchen/papers/msdajoint.pdf

Loss-Specific Learning of Complex Hash Functions. Mohammad Norouzi (University of Toronto), David Fleet (University of Toronto), Ruslan Salakhutdinov (University of Toronto)

  • Abstract: Motivated by large-scale multimedia applications we propose a framework for learning mappings from high-dimensional data to binary codes, while preserving semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit fast exact kNN search. The framework is applicable to broad families of mappings, and two flexible classes of loss function. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss. Experiments show strong retrieval and classification results using no more than kNN on the binary codes.
  • Workshop Paper: pdf

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach. Boqing Gong (U. of Southern California), Fei Sha (University of Southern California), Kristen Grauman (University of Texas at Austin)

  • Abstract: Recent studies have shown that recognition datasets are biased. Paying no heed to those biases, learning algorithms often result in classifiers with poor cross-dataset generalization. We are developing domain adaptation techniques to overcome those biases and yield classifiers with significantly improved performance when generalized to new testing datasets. Our work enables us to continue to harvest the benefits of existing vision datasets for the time being. Moreover, it also sheds insights about how to construct new ones. In particular, we have raised the bar of collecting data --- the most informative data are those which cannot be classified well by learning algorithms adapting from existing datasets.
  • Workshop Paper: pdf

Picture Tags and World Knowledge. Lexing Xie (Australian national university)

  • Abstract: This paper studies the use of everyday words to describe images. The common saying has it that {\em a picture is worth a thousand words}, here we ask {\em which thousand}? We propose a new method to exploit visual semantic structure by jointly analyzing three distinct resources: Flickr, ImageNet/WordNet, and ConceptNet. This allows us to quantify the visual relevance of both tags and their relationships, which in turn lead to an algorithm for image annotation that takes into account both image and tag features. We analyze over 5 million semantically tagged photos, their statistics allow us to observe tag utility and meanings. We have also obtained good results for image tagging, including generalizing to unseen tags We believe leveraging real-world knowledge is a very promising direction for image retrieval. Potential other applications include generating natural language descriptions of pictures, and validating the quality of commonsense knowledge.
  • URL to the latest version: http://users.cecs.anu.edu.au/~xlx/proj/tagnet/

Randomly Multi-view Clustering for Hashing. Caiming Xiong (SUNY at Buffalo), Jason Corso (SUNY at Buffalo)

  • Abstract: This paper addresses the problem of efficient learning similarity preserving binary codes for fast retrieval in large-scale data collections. We propose a simple and efficient randomly multi-view clustering schema for finding hash functions so as to decrease hamming distance of the relatively close points: we first use PCA to reduce the dimensionality of data points and obtain compact representations, then multiple the new representation to Hadamard matrix so as to equalize the variance of each dimension; second find the $l$-bits binary code for all data points via randomly multi-view clustering that extract $l$-different view of data distribution by randomly choosing $k$ dimensions ,then for each view, we obtain 1-bit via partitioning data points into two clusters; finally achieve $l$ classifiers using max margin SVM with clustering result of each view as training data to predict the binary code for any query points. Our experiments show that our binary coding scheme results in better performance that several other state-of-the-art methods.
  • Workshop Paper: pdf

Semantic Kernel Forests from Multiple Taxonomies. Sung Ju Hwang (University of Texas, Austin), Fei Sha (University of Southern California), Kristen Grauman (University of Texas at Austin)

  • Abstract: We propose a discriminative feature learning approach that leverages multiple hierarchical taxonomies representing different semantic views. For each taxonomy, we first learn a tree of semantic kernels, where each node has a Mahalanobis kernel optimized to distinguish between the classes in its children nodes. Then, using the resulting semantic kernel forest, we learn class-specific kernel combinations to select only those kernels relevant for category recognition, with a novel hierarchical regularizer that exploits the taxonomies’ structure. We demonstrate our method on challenging object recognition datasets.
  • Workshop Paper: pdf

Visually-Grounded Bayesian Word Learning. Yangqing Jia (UC Berkeley), Joshua Abbott (UC Berkeley), Joseph Austerweil (UC Berkeley), Thomas Griffiths (UC Berkeley), Trevor Darrell (UC Berkeley)

  • Abstract: Learning the meaning of a novel noun from a few labelled objects is one of the simplest aspects of learning a language, but approximating human performance on this task is still a significant challenge. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for given stimulus. Recent work in cognitive science on Bayesian word learning partially addresses this challenge, but assumes that objects are perfectly recognized and has only been evaluated in small domains. We present a system for learning words directly from images, using probabilistic predictions generated by visual classifiers as the input to Bayesian word learning, and compare this system to human performance in a large-scale automated experiment. Combining the uncertain outputs of the visual classifiers with the ability to identify an appropriate level of abstraction that comes from Bayesian word learning allows the system to better capture the human word learning behaviors than previous approaches.
  • URL to the latest version: http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-202.pdf