Mining Discriminative Attributes from the Data for Image Classification
With Aditya Khosla, Bangpeng Yao and Professor Fei-Fei Li
Using discriminative, high-level features to better classify images is an emerging trend in modern computer vision algorithms. Some vision techniques rely on richer annotation (like Attribute annotation and Poselets annotation) to classify images more accurately. Such annotations usually give discriminative and semantic information about the contents of the image in question. Another vision technique called "object bank" uses pre-trained object classifiers to provide semantic information about an image. These pre-trained classifiers are learnt for extremely common objects in images.
Our research aims to automatically extract the same discriminative, high-level features described above. We aim to develop methods to extract such features from an image without using annotations or pre-trained classifiers.
1) Our first algorithm uses randomly sampled image patches and active learning to obtain these attributes automatically from the data. Our implementation provides state-of-the-art results on two difficult datasets and is currently under review at CVPR 2012.
2) I am currently leading a project to obtain a sparse image representation using concatenated model responses. Robust models are trained using similar image regions automatically mined from the data. I plan to use basic clustering techniques to obtain similar image regions. The goal is to obtain clusters where each cluster center has a semantic meaning (like poselets or identifiable object models). One of the challenges is to obtain image regions that have a semantic meaning across different scales and locations. We also need to sparsify the final image representation since the total number of clusters obtained are extremely high. We have developed an algorithm that efficiently achieves the above mentioned objectives and gives some promising initial results. We are in the process of formalizing and fine-tuning this algorithm for submission to ECCV 2012.
Novel Dataset for Fine Grained Image Categorization: ImageNet Dogs
With Aditya Khosla, Bangpeng Yao and Professor Fei-Fei Li
ImageNet Dogs is a new dataset for use in the burgeoning field of Fine Grained Image Categorization. Compared to other datasets available for fine grained image categorization (like Caltech-UCSD Birds 200), ImageNet Dogs is a much larger dataset. Having images of 120 dog species with at least 150 images per class this dataset will help us understand the dependence of classification algorithms on the amount of images available per class. This is also a much harder dataset since there is significant intra-class variation. In any given class, the images of the dogs cover a wide range of ages, poses and colors. The images were collected from ImageNet (under synset dog). This work was published in the First Workshop on Fine-Grained Visual Categorization, CVPR 2011.
[Website], [abstract], [Poster]
Indoor Scene Recognition
With Professor Charless Fowlkes
Experiments were done using MIT's Indoor Scene Dataset. We explored the option of extracting Spatial Pyramid Features from image segments instead of extracting them from the entire image. Using the histogram intersection scores of corresponding image segments, we formulated a kernel. This formulation would help us learn a classifier for an indoor scene category. Number of image segments could be controlled by tweaking the threshold for the contour detection. Contours and consequently image segments were obtained using Contour Detection and Image segmentation resources available here.
Some key challenges were to determine the right number and size of image segments and to find corresponding image segments in two images. Corresponding image segments were assigned using Munkre's assignment algorithm or the Hungarian Algorithm. The cost of assignment was determined using a concoction of histogram intersection scores and area of segment overlap.