Abstracts&Presentations

Day 1: Tuesday, December 20, 2011

State-of-the-Area

Challenges in Visual Recognition: Historical Perspective - Jitendra Malik

download presentation


  to be updated

Segmentation and Representation - Martial Hebert

download presentation


Successful recognition and scene understanding approaches are based on inference of labels (or object box) from image features (e.g., SVM-HoGs or variations on CRFs for segmentation labeling). We'll review ways to incorporate additional information in the recognition process (3D, domain knowledge, physical constraints, etc.) and to follow alternate reasoning models beyond the standard classification/inference approach.

Morning Session: Recognition

Searching for smooth objects in large scale image datasets - Andrew Zisserman

download presentation


  to be updated

Max-Margin Latent Variable Models - M Pawan Kumar

download presentation


Latent variables model missing information, thereby allowing us to learn high-level vision tasks using large, inexpensively assembled datasets with incomplete or noisy annotations.

I will present our recent work on self-paced learning for estimating the parameters of a latent variable model, and demonstrate its efficacy using several applications. I will also discuss some (not yet concrete) ideas on learning with very large datasets and obtaining ""peakier"" estimates of the latent variables.

Recognizing Actions and Attributes of People - Subhransu Maji

download presentation   download presentation


We build on top of the state of the art in detection and "crowdsourcing", to develop representations that allow fine grained recognition of visual categories. Our representation based on "poselets", provides a flexible decomposition of a visual category into patterns corresponding to aspects, pose, or visual subcategories. We present apply this framework to solve two challenging tasks, (1) recognizing actions and (2) attributes such as gender, hair-type, clothing, etc., of people in still images.

Afternoon Session: Recognition

Talks

Title Not Updated - Alex Berg

download part1

download part2

download part3



  to be updated

Exploiting the structure of vision problems for efficient optimization - Pushmeet Kohli

download presentation


Many problems in computer vision and machine learning require inferring the most probable states of certain hidden or unobserved variables. This inference problem can be formulated in terms of minimizing a function of discrete variables. The scale and form of computer vision problems raise many challenges in this optimization task. For instance, functions encountered in vision may involve millions or sometimes even billions of variables. Furthermore, the functions may contain terms that encode very high-order interaction between variables. These properties ensure that the minimization of such functions using conventional algorithms is extremely computationally expensive. In this talk, I will discuss how many of these challenges can be overcome by exploiting the sparse and heterogeneous nature of discrete optimization problems encountered in real world computer vision problems. Such problem-aware approaches to optimization can lead to substantial improvements in running time and allow us to produce good solutions to many important problems.

Shared Representations For The Classification of Known And New Classes - Dhruv Mahajan

download presentation


Object recognition is still a challenging problem with number of classes to be recognized increasing. This makes the models that can share representation (like semantic attributes) between existing as well as unseen classes useful. I will talk about a new approach to learning attribute-based descriptions of objects. Unlike earlier works, we do not assume that the descriptions are hand-labeled. Instead, our approach jointly learns both the attribute classifiers and the descriptions from data. By incorporating class information into the attribute classifier learning, we get an attribute level representation that generalizes well to both known and unseen classes. I will also talk briefly about another form of representation sharing, when the hierarchical structure is available on the classes or attributes.

Spotlights and Posters


A Novel Approach for Human Activity Recognition at a Distance in Video - Dipti Prasad Mukherjee

download presentation


We propose a graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses. The proposed methodology follows a bag-of-words approach that starts with a large dictionary of poses (visual words) and derives a refined and compact dictionary of key poses using centrality measure of graph connectivity (a measure of ambiguity of poses), where poses are nodes in the graph. We introduce a ‘meaningful’ threshold on centrality measure (by perceptual analysis technique) that selects key poses for each action type. The key poses are graph nodes sharing a close semantic relationship with all other pose nodes and hence are expected to be at the central part of the graph. Our contribution includes a novel pose descriptor based on Histogram of Oriented Optical Flow (HOOF) evaluated in a hierarchical fashion on a video frame. This pose descriptor combines both pose information and motion pattern of the human performer into a multidimensional feature vector. We evaluate our methodology on four standard activity-recognition datasets demonstrating the superiority of our method over the state-of-the-art. We extend the proposed method to recognize interactions (e.g. handshaking, punching, etc.) between two human performers in a video. In this case, using perceptual analysis technique the set of `key pose doublets' that best represent the corresponding interaction is selected. The recognition results on standard interaction recognition dataset show the efficacy of the proposed approach compared to the state-of-the-art.


Distributed Application Development Constraints - Subrata Rakshit

download presentation


At CAIR (DRDO), we focus on building applications for defense scenarios leveraging the latest advances in computer vision and image processing. In our latest attempts at doing so for the emerging Net-Centric context, we have come across various design constraints that have forced us to look at unconventional implementations and hacks. They also provide us with a wish list for driving research in academia in the coming years. The focus will be mainly on content based retrieval systems and object recognition.


  Max margin Manik Varma

download presentation


The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of $L$ labels, a data point can be tagged with any of the $2^L$ possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We also design a specialised algorithm for linear kernels based on dual co-ordinate ascent with shrinkage that lets us effortlessly train on a million points with a hundred labels.



Hashing for Large-scale Active Learning - Prateek Jain

download


  to be updated

Day 2: Wednesday, December 21, 2011

Morning Session: Recognition

State-of-the-Area/Overview

Face Detection: Where we are, and what next? - Vidit Jain

download presentation



Nowadays, even tiny mobile devices have the capability of detecting faces in a photo. Is single-image face detection really solved? In this talk, we identify the state-of-the-art performance on a challenging face data set, FDDB, and distinguish the focus areas for the current research in face detection. We also discuss our recent domain adaptation work that obliterates the need of massive training set to train a detector from scratch, rather it rapidly and simply adapts an existing detector to a new domain. We conclude our discussion by highlighting some of the open directions in face detection research.

Tamara Berg

download presentation


  to be updated

Data-Driven Vision

"Visual Inference by Composition" - Michal Irani

download presentation


Inference tasks in Computer Vision (classification, detection, search, reconstruction, etc.) often rely on having prior databases of examples to train on, or else (when those are not available) on simple notions of visual similarity. In this talk I will show that complex visual inference tasks can often be performed without any prior examples, by exploiting data redundancy and co-occurrence within and across different parts of the visual datum. This data-driven approach gives rise to complex notions of visual similarity (static and dynamic) and to a general "Inference by Composition" approach. I will demonstrate the power of this approach on a variety of visual inference tasks -- some old, some new (as time permits).

Novelty Detection from Wearable Cameras - Stefan Carlsson

download presentation


A small video camera attached o the body of a person can record essentially the same view as the person's own visual system. This has been exploited e.g in the Microsoft Sensecam where the potential for memory therapy has been demonstrated. Ideally such a system would benefit from automatic extraction of memorizable images. In the general case, this is still an ill defined and thereby very difficult problem. If we however restrict memorizable to the concept of novelty, which has been demonstrated to be an important factor in the creation of new memories, and consider the less general but very common situation of a person performing a daily repeated activity over extended time we will show that it is possible to automatically extract visual information related to deviations from normal behaviour of a person or changes in the environment. This is done in two major steps. In the first step a complete image sequence is registered w.r.t. previously acquired sequences and novelty is measured in terms of lack of registration resulting from a strong deviation in the daily repeated activity. In the second step, images acquired from similar locations but different days are registered in order to separate new objects from non-changing background. The resulting automatically extracted images correspond very well with those extracted manually with the purpose of storing novel non-repeated events and objects encountered by the person wearing the camera. "

Beyond Naming: Image Understanding via Physical, Functional and Exemplar-Based Representations - Abhinav Gupta

download part1

download part2


What does it mean to "understand" an image? One popular answer is simply naming the objects seen in the image. During the last decade most computer vision researchers have focused on this "object naming" problem. While there has been great progress in detecting things like "cars" and "people", such a level of understanding still cannot answer even basic questions about an image such as "What is the geometric structure of the scene?", "Where in the image can I walk?" or "What is going to happen next?". In this talk, I will present three different type of representations which help us to develop deeper understanding of the visual world: (1) Firstly, I will talk about physically and geometrically based representations that are meaningfully grounded in the real world. (2) Next, I will introduce human-centric representation where we represent and reason about space from the point of view of a human agent. (3) Finally, I will briefly discuss exemplar-based representations where recognition is itself formulated as an association problem.

Afternoon Session: Vision & Computation

Talks

Solutions to MRF-MAP Labeling Problems using Primal Dual Strategies - Subhashis Banerjee

download presentation


MRF-MAP labeling is a much used modeling technique in computer vision for which efficient optimal algorithms are known only for the subcase of 2-label 2-clique problems with submodular clique potential. Most of the problems of practical interest, however, do not possess these properties. We show here a methodology that extends the primal dual based optimization modeling introduced by Komadakis and Tziritas to develop an approximate algorithm for the multi label 2-clique problems to all MRF-MAP labeling problems. As an example we show that there exists an optimal algorithm for 2-label multi-clique problems when the potential functions are submodular. The optimal solution in this case requires O(2^kn^3) steps compared to existing reduction based methods which give approximate solutions in O(2^{5k}n^3) steps.

Computer Vision on the GPUs: What is the fuzz all about? - P J Narayanan

download presentation


"The Graphics Processor Units (GPUs) are emerging as viable computation platforms for a number of applications. Computer Vision is an area that can benefit from the recent advances in the GPUs. In this talk, I will present a brief case for GPUs as computing devices and will describe some of our own efforts at performing Computer Vision tasks on the GPU, including Graph Cuts, Clustering, etc. I will wildly speculate on the implications of using different computation platforms for large vision problems from the point of view of energy consumption."

Lagged Fibonacci Generator on GPU - Sharat Chandran

download presentation


As vision algorithms mature with increasing inspiration from the learning community, statistically independent pseudo random number generation (PRNG) becomes increasingly important. At the same time, execution time demands have seen algorithms being implemented on evolving parallel hardware such as GPUs. The Mersenne Twister (MT) has proven to be the current state of the art for generating high quality random numbers, and the Nvidia provided software for parallel MT is in widespread use. While execution time is important, development time is also critical. As processor cardinality changes, a foundation for generating simulations that will vary only in execution time and not in the actual result is useful; otherwise the development time will be impacted. In this poster, we present an implementation of the Lagged Fibonacci Generator (LFG) considered to be of quality equal to MT on the GPU. Unlike MT, LFG has this important processor-cardinality agnostic capability -- that is -- as the number of processing resources changes, the overall sequence of random numbers remains the same. As a bonus, LLFG outperforms the MT in terms of time."

Spotlights and Posters

Heritage - P.Anandan

download presentation


  to be updated

not updated - Uma Mudenagudi

download presentation


  to be updated

Integrity preserving super-resolution and image inpainting. - Manjunath Joshi

download presentation


In the poster the following problems are addressed. We first present a new method for image super-resolution (SR) that incorporates the preservation of integrity with the use of a watermark. Here the SR approach uses the contourlet based learning while the SR watermarking scheme uses a compressive sensing (CS) based method. Next, the application of SR in multi-resolution image fusion is discussed. We then address the problem of automatic region detection and inpainting for the facial monuments of historic places. Finally, we discuss the spectral unmixing problem applied to the hyperspectral image data.

Glimpses of Image Processing Research at E & ECE Department, IIT Kharagpur - Prabir Biswas

download presentation


This presentation will cover an overview of the research work on Image Processing carried out at E & ECE Department, IIT Kharagpur.

not updated - Uma Mudenagudi

download presentation


  to be updated

India-Specific Session

- Special Technical Challenges in India
- Socio-Cultural Environment for Research in India
- Efforts/initiatives to redress these

Day 3: Thursday, December 22, 2011

Morning Session - Video

State-of-the-Area/Overview

The impact of computer vision on video - Shmuel Peleg

download presentation


In this talk I would like to look at various computer vision applications, and try to guess in which area computer vision could have the largest impact. The talk does not have any conclusions, but will try to stir some discussion in this direction.

Harnessing Video Content - Santanu Chaudhury



download presentation


In my talk I shall present the work being pursued in analysing video content for semantic annotation. We fundamentally exploit the principle of Gestalt grouping for content extraction. We develop Bayesian reasoning and machine learning based framework for object extraction. We develop classification scheme for interpreting these object motions for annotations. Similar features are also used for event detection. In this case we use multi-modal features.

Video

Coherency Sensitive Hashing - Shai Avidan

download presentation


Coherency Sensitive Hashing (CSH) extends Locality Sensitivity Hashing (LSH) and PatchMatch to quickly find matching patches between two images. LSH relies on hashing, which maps similar patches to the same bin, in order to find matching patches. PatchMatch, on the other hand, relies on the observation that images are coherent, to propagate good matches to their neighbors, in the image plane. It uses random patch assignment to seed the initial matching. CSH relies on hashing to seed the initial patch matching and on image coherence to propagate good matches. In addition, hashing lets it propagate information between patches with similar appearance (i.e., map to the same bin). This way, information is propagated much faster because it can use similarity in appearance space or neighborhood in the image plane. As a result, CSH is at least three to four times faster than PatchMatch and more accurate, especially in textured regions, where reconstruction artifacts are most noticeable to the human eye. We verified CSH on a new, large scale, data set of 133 image pairs.

On sparsity in the space of image patches - Lihi Zelnik

download presentation


Many recognition algorithms are based on two a-priori steps: (1) sparse feature selection in the image, and (2) constructing a sparse representation of the set of features. By definition, sparse representations imply loss of information. In this talk we will discuss methods for minimizing the information loss. To handle the first issue, we will present methods for obtaining dense features that are principled and hence produce a useful set of descriptors. To handle the second issue, we will further place under a unified framework several previous approaches for representing a set of features, including Bag-Of-Words, K-Nearest Neighbors and sparse-coding. We will then suggest a novel approach for representing the space of patches/descriptors that provides a better coverage of the space and hence reduces the information loss of sparse representations.

Afternoon Session - Video

Talks

Activity Recognition: Statistical and Structural Models - Ram Nevatia

download presentation


Two major approaches to activity recognition have evolved in recent years. First is one that we will call Statistics of Local Features (SLF) where a number of local spatio-temporal features are computed and then aggregated to provide a statistical feature vector. The second one, that we will call Structured Activity Models (SAM) attempts to decompose complex activities into simpler ones and model the structure of their relationships. SLF is attractive due to its simplicity and robustness to performance of lower level object detectors but its discriminative power for subtle distinctions may be limited. SAM is capable of providing structured descriptions for output but is typically highly dependent on correct detection and tracking of actors and objects. This talk will mostly focus on the SAM approach but will also attempt to compare with SLF and speculate on how the two may be combined advantageously. In addition, the talk will also briefly reference our work on two major new activity recognition research projects.

Human-Centric Computer Vision: Beyond the Role of a Human as a Labeler - Ashish Kapoor

download presentation


There has been a lot of recent research that considers ‘humans-in-the-loop’ to solve hard computer vision tasks. However, much of such prior work in computer vision limits the role of a human as a resource for labeled data. In this talk, we aim to highlight coupling of vision systems with humans that go beyond the simplistic view of a human as a data annotator. In particular, we will consider a more general and richer class of problems that considers the humans not only as a labeler but also as a designer, explorer and a computational architecture that is complementary to the current vision systems. We will highlight mathematical and technical challenges that arise when addressing the problem of learning about and from the people and will discuss principles in the context of image categorization, search and several other computer vision applications.

Privacy in Video Recognition - C V Jawahar

download presentation



With huge quantity of visual data captured everywhere for automated processing and archival, privacy issues have come into the forefront. If we need to design effective private recognition schemes, we need (i) efficient computational schemes and (ii) reliable performance across wide variety of data and sensor setting. In this talk, we discuss a specific method for privacy preserving video computing and discuss the associated issues."

Spotlights and Posters

Not Updated - Sheshadri Thiruvenkadam

download presentation


Not updated

Imaging Research in Siemens: Opportunities and challenges in India - Amit Kale

download presentation


The needs of emerging markets differ from those in developed countries. For instance, the number of doctors per capita is much smaller necessitating algorithmic workflow modifications. Some of these needs can be addressed by imaging algorithms. Furthermore there exist several disease patterns which do not exist in developed countries which are endemic in emerging markets such as presence of tuberculosis. This often causes difficulties when the person also has cancer, specifically when treatment decisions have to be taken based on estimating the stage of the cancer. Thirdly, there is a need to come up with cost effective addons to existing equipment. We present examples of research undertaken in our lab based on vision algorithms which address some of these issues. In this talk, we present some of topics undertaken by the Siemens Imaging Lab in India to address some of these problems. As an example of development of low cost solutions without compromising accuracy we present the monocular Visual Patient Movement Monitoring (MVPM) System for frameless stereotaxy. The system consists of a light weight mouth bite with dual ?ducial system used for position tracking. One set of ?ducials consist of easy-to-detect checkerboard whose 3D position can be tracked using a pre-calibrated off-the-shelf camera and a set of radio-opaque rods suitable for X-ray imaging. The system provides a low cost alternative to accurate position monitoring during stereotactic radiotherapy.

Object tracking using dynamic programming. - Jayanta Mukhopadhyay

download presentation


A very brief presentation on a technique for off-line and real time object tracking based on dynamic programming.

VIDLOOKUP - A web-based online CBVR system for Query video shots, using a novel MST-CSS representation - Sukhendu Das

download presentation


A combination of shape, motion, and color features has been used to retrieve video shots with similar background and foreground moving objects. A novel Multi-Spectro Temporal Curvature Scale Space technique is used for a joint representation of shape and motion features. Color feature is extracted from the key (median) frame of the video shot to be analyzed. At the retrieval stage, a weighted combination of the match-cost values obtained from matching of the three features: Shape, Motion and Color are used for rank-ordering the video shots. One can adjust the weights depending upon the user’s intention for retrieving certain kind of similar videos. A web-based interface has been developed for Content-based video retrieval to search video based on content of a query clip. The web-based interface allows three different operations: Query by Example, Query by sketch and adding video shots to the database.

Development of Analysis and Indexing tools for harnessing Educational Videos - Gaurav Harit

download presentation


Not updated

Framework for Foreground/Background based Parametric Video Coding - Sumantra Dutta Roy

download presentation


Not updated

Day 4: Friday, December 23, 2011

Morning Session - Internet-Drive Vision

Talks

Overview - Manik Varma

download presentation



 

Internet-Driven Vision (from the perspective of a machine learning person at Google) - Jason Weston

download presentation


The internet has changed machine vision (and machine learning) both in terms of i) data sources (large publicly available datasets and the ability to easily collect new data sets e.g. via mechanical turk); and ii) in terms of end-applications (like image search or YouTube). The re-imagining of both input and output has then shaped the choice of methods & algorithms (the process in the middle). So, well, presumably everything has changed :) But .. are we seeing breakthroughs from these changes, and if not, why not? The first part of the talk will cover how machine vision fits in with other technologies from an internet search engine perspective, we will briefly summarize some of the research at Google, and also discuss the question above. The second part of the talk will discuss some of my own work on the tasks of image annotation and ranking in more detail.

Comments