Weekly Computer Vision Reading Groups

Reading Group 25 March 2015

Title: Neural Turing Machines

Authors: Alex Graves, Greg Wayne and Ivo Danihelka.

Abstract: We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

Alex Kendall's Slides.

Reading group 19 March 2015

Deep Convolutional Inverse Graphics Network

Abstract: This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN) that aims to learn an interpretable representation of images that is disentangled with respect to various transformations such as object out-of-plane rotations, lighting variations, and texture. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm [11]. We propose training procedures to encourage neurons in the graphics code layer to have semantic meaning and force each group to distinctly represent a specific transformation (pose,light,texture,shape etc.). Given a static face image, our model can re-generate the input image with different pose, lighting or even texture and shape variations from the base face. We present qualitative and quantitative results of the model’s efficacy to learn a 3D rendering engine. Moreover, we also utilize the learnt representation for two important visual recognition tasks: (1) an invariant face recognition task and (2) using the representation as a summary statistic for generative modeling.

Authors: Tejas Kulkarni, Will Whitney, Pushmeet Kohli, Joshua Tenenbaum

Download Link: http://arxiv.org/pdf/1503.03167v1.pdf (NB: version will no doubt be updated)

Simon's presentation slides.

Reading group 5 March 2015

Going Deeper with Convolutions (aka "the Inception paper" aka "the GoogLeNet paper").

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

Video of Szegedy's talk.

Matt's presentation slides.

Abstract:

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

New Tools Wiki.
- Want to ask someone in the lab about a language/tool? Look up the expert here.
- Know a language/tool? Please add yourself.
Reading Group 11th September, 2014

Viorica will be leading the reading group. She will present the following paper from Yuri Boykov's group.

Title: Energy-based Geometric Multi-Model Fitting

Abstract: Geometric model fitting is a typical chicken-&-egg problem: data points should be clustered based on geometric proximity to models whose unknown parameters must be estimated at the same time. Most existing methods, including generalizations of RANSAC, greedily search for models with most inliers (within a threshold) ignoring overall classification of points. We formulate geometric multi-model fitting as an optimal labeling problem with a global energy function balancing geometric errors and regularity of inlier clusters. Regularization based on spatial coherence (on some near-neighbor graph) and/or label costs is NP hard. Standard combinatorial algorithms with guaranteed approximation bounds (e.g. α-expansion) can minimize such regularization energies over a finite set of labels, but they are not directly applicable to a continuum of labels, e.g. R^{2} in line fitting. Our proposed approach (PEARL) combines model sampling from data points as in RANSAC with iterative re-estimation of inliers and models’ parameters based on a global regularization functional. This technique efficiently explores the continuum of labels in the context of energy minimization. In practice, PEaRL converges to a good quality local minimum of the energy automatically selecting a small number of models that best explain the whole data set. Our tests demonstrate that our energy-based approach significantly improves the current state of the art in geometric model fitting currently dominated by various greedy generalizations of RANSAC.

Download Link: http://www.csd.uwo.ca/~yuri/Papers/ijcv10_pearl.pdf

Please come and join us on Thursday at 3pm.

Reading Group 28th August, 2014

Tomorrow's reading group will be led by Vijay. He will be presenting paper from Stephane Mallat's group:

Title: Invariant Scattering Convolution Networks

Download Link: http://arxiv.org/abs/1203.1513

Abstract: A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with non-linear modulus and averaging operators. The first network layer outputs SIFT-type descriptors whereas the next layers provide complementary invariant information which improves classification. The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification.

A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having the same Fourier power spectrum. State of the art classification results are obtained for handwritten digits and texture discrimination, using a Gaussian kernel SVM and a generative PCA classifier.

Please come and join us tomorrow 3pm, MIL Meeting room, 5th floor.

Reading Group 21st August, 2014

Pablo has very kindly agreed to take the lead for this week's reading group and he will be squeezing details of Frank's popular iSAM.

Download Link: http://www.cc.gatech.edu/~kaess/pub/Kaess08tro.pdf

Abstract: In this paper, we present incremental smoothing and mapping (iSAM), which is a novel approach to the simultaneous localization and mapping problem that is based on fast incremental matrix factorization. iSAM provides an efficient and exact solution by updating a QR factorization of the naturally sparse smoothing information matrix, thereby recalculating only those matrix entries that actually change. iSAM is efficient even for robot trajectories with many loops as it avoids unnecessary fill-in in the factor matrix by periodic variable reordering. Also, to enable data association in real time, we provide efficient algorithms to access the estimation uncertainties of interest based on the factored information matrix. We systematically evaluate the different components of iSAM as well as the overall algorithm using various simulated and real-world datasets for both landmark and pose-only settings.

Please come and join us at 3pm Thursday.

Reading Group 14th August, 2014

I will be leading this week's reading group on this interest paper from Rob Fergus's lab.

Title: Intriguing properties of neural networks

Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.

First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

Download Link: http://cs.nyu.edu/~zaremba/docs/understanding.pdf

Please come and join us at 3pm this thursday in MIL Meeting room 5th floor.

Reading Group 8th August, 2014

I will lead this week's reading group and present this interesting paper on real-time non-rigid surface reconstruction and tracking.

Download Link: http://www.graphics.stanford.edu/~niessner/zollhoefer2014deformable.html

Real-time Non-rigid Reconstruction using an RGB-D Camera

Abstract: We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A new stereo matching algorithm estimates real-time RGB-D data. We start by scanning a smooth template model of the subject as they move rigidly. This geometric surface prior avoids strong scene assumptions, such as a kinematic human skeleton or a parametric shape model. Next, a novel GPU pipeline performs non-rigid registration of live RGB-D data to the smooth template using an extended non-linear as-rigid-as-possible (ARAP) framework. High-frequency details are fused onto the final mesh using a linear deformation model. The system is an order of magnitude faster than state-of-the-art methods, while matching the quality and robustness of many offline algorithms. We show precise real-time reconstructions of diverse scenes, including: large deformations of users' heads, hands, and upper bodies; fine-scale wrinkles and folds of skin and clothing; and non-rigid interactions performed by users on flexible objects such as toys. We demonstrate how acquired models can be used for many interactive scenarios, including re-texturing, online performance capture and preview, and real-time shape and motion re-targeting.

Video: https://www.youtube.com/watch?v=IqZx1ggVBOw

Additional material of possible interest: http://sites.fas.harvard.edu/~cs277/papers/deformation_survey.pdf

Presenter: Ankur Handa

Seminar Thursday 7th August, 2014, Walterio Mayol-Cuevas, University of Bristol

Real Time Methods for Mapping, Relocalisation and Learning in FlyingRobots and Wearables

Abstract: Over the years we have been developing a range of methods for the rapid processing of visual information. These methods have the

emphasis of working in real-time and principally aimed at low computational requirements. We have developed mapping onboard small UAVs that we are targeting for industrial inspection, fast relocalisation approaches based on regression and geometric filtering, as well as ways in which automated

learning from visual information for flying robots increases their awareness of the environment beyond purely geometric representations.

We have also been using some of these competences together with object discovery and detection for egocentric video processing in wearable

systems. This talk will give a brief overview of the operation and application of these works and discussion on ongoing research in these areas.

Reading Group Thursday 31st July, 2014

Title: LSD-SLAM: Large Scale Direct Monocular SLAM

Authors: Jakob Engel, Thomas Schöps, Dr. Juergen Sturm, Prof. Dr. Daniel Cremers

Abstract:

We propose a direct (feature-less) monocular SLAM algorithm which, in contrast to current state-of-the-art regarding direct methods, allows to build large-scale, consistent maps of the environment. Along with highly accurate pose estimation based on direct image alignment, the 3D environment is reconstructed in real-time as pose-graph of keyframes with associated semi-dense depth maps. These are obtained by ﬁltering over a large number of pixelwise small baseline stereo comparisons. The explicitly scale-drift aware formulation allows the approach to operate on challenging sequences including large variations in scene scale. Major enablers are two key novelties: (1) a novel direct tracking method which operates on sim(3), thereby explicitly detecting scale-drift, and (2) an elegant probabilistic solution to include the eﬀect of noisy depth values into tracking. The resulting direct monocular SLAM system runs in real-time on a CPU

Download Link: http://vision.in.tum.de/research/lsdslam

Presenter: Ankur Handa

Reading Group Thursday 19th June, 2014

Matt's Presentation slides: view, download (3.1 MB PDF).

Title: Generative Adversarial Nets

Authors: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Download from: http://arxiv.org/abs/1406.2661

Reading Group Thursday 22nd May, 2014

We are entering the second round of our reading groups from tomorrow and hence it's my turn to host this week's. I will be presenting this paper on R-CNNs titled

Title: Rich feature hierarchies for accurate object detection and semantic segmentation

Authors:

Ross Girshick,

Jeff Donahue,

Trevor Darrell,

Jitendra Malik

Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL https://github.com/rbgirshick/rcnn

The paper is available for download here at http://arxiv.org/pdf/1311.2524v3.pdf

Please come and join us at 3pm in our de-facto reading group meeting room on 5th floor.

Presenter: Ankur Handa

Reading Group Thursday 8th May, 2014

Sukrit will be presenting the following paper.

Paper : A Practical Regularity Partitioning Algorithm and its Applications in Clustering

This is quite a heavy paper to follow but Sukrit gives a reason for choosing this paper

Deep Learning has been the topic of interest for many communities in recent times. Researchers from the communities of machine learning, image classification/ pose estimation in computer vision, data mining, natural language processing and biomedical data analysis have all applied deep learning architectures achieving state-of-the-art results. However, there are some problems that arise in the real world scenario where the training and test data both are multi-label and the training data does not contain sufficient information about their labels. In such cases, the need for disentangling the labels (or factors of variation as they are called in the deep learning community) becomes greater with the problem becoming more complex, and state-of-the-art deep architectures do not seem to work well. Some recent suggestions (like by Bengio 2013 Looking Forward paper) have been to utilize clustering procedures over the deep net generated features or incorporate efficient clustering criteria into the fine tuning procedures. Based on such suggestions, some people envision the use of extremal graph theory for deep learning to provide efficient ways of disentangling factors of variation. This paper that we discuss for the Reading Group presents a practical approach of clustering which is significantly better than spectral clustering methods. Moreover, it should help to clarify many ideas in graph theory that can be useful for improving disentangling with deep architectures !!

The reading group will resume at its regular time at 3pm MIL Meeting room on 8th May. Have a great weekend ahead.

Reading Group Thursday 1st May, 2014

Title: Convex optimisation

Greetings all,

A slight change of the plan - we will be doing a round on convex optimisation tomorrow in the reading group. Bamdev is due to leave shortly (with his convex optimisation gurudom) and we thought it might be one last opportunity to soak up the expertise from him.

Found this nice tutorial from Mark Schmidt's (who was a post-doc of Bamdev's supervisor, Francis Bach) website http://www.di.ens.fr/~mschmidt/Documents/convexOptim.pdf and we can flick through the slides in the hour. Time permitting we can look into details but I think we should only be looking to get pointers to different optimisers and tools that are available.

I will also share a document later where each one of us can write a synopsis of the optimisation tools and methods they are aware of.

Time:3-4pm

Place:MIL Meeting room 5th floor.

Seminar Thursday 24th April, 2014 by Ming-Ming Cheng

Title: Efficient Image Scene Analysis and Applications

Abstract: Images remain one of the most popular and ubiquitous media for capturing and documenting the world around us. Developing efficient algorithms for understanding such

images is of great importance for many applications in computer vision and computer graphics. In this report, I will present three algorithms for efficient image scene understanding as well

their applications.

Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object extraction algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. Experimental results on famous benchmarks demonstrated that our algorithm consistently outperforms existing salient object detection and segmentation methods, yielding higher precision and better recall rates. The proposed method, which do not require having expensive training data annotation in advance, provides an economical and practical tool to analysis large scale unlabeled dataset (e.g. internet images).

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We proposed a novel binarized normed gradients (BING) feature for objectness estimation of image windows. Our novel feature enables a few atomic operations(e.g. ADD, BITWISE SHIFT, etc.) to test the objectness score of an image window. Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU , 1000 times faster than existing methods) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals.

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how we would like to access images versus their typical representation is the goal of image parsing. In this paper we propose treating nouns as object labels and adjectives as visual attributes. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution to this problem. Using the extracted attribute labels as handles, our system empowers a user to verbally refine the results. This enables hands free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics.

Reading Group Thursday 17th April, 2014

Bamdev who is from control group is going to lead this week's reading group on the optimisation tool called Manopt, he has been working on jointly with N. Boumal, P.-A. Absil, Yu. Nesterov, and R. Sepulchre.

Title: Manopt: a toolbox for optimization on manifolds

Abstract:

Optimization on manifolds is a powerful paradigm to address nonlinear optimization problems. In this presentation, I will give a basic introduction to this geometric framework and bring out few salient connections and differences with standard nonlinear optimization techniques. Subsequently, I will present the new open-source Matlab toolbox Manopt. With Manopt, it is easy to deal with various types of manifold constraints which arise naturally in applications, such as orthonormality, low rank, positive definiteness, and rotations, to a name a few.

Toolbox: http://manopt.org/

Paper: http://arxiv.org/abs/1308.5200, accepted for publication in JMLR, 2014

Reading Group Thursday 10th April, 2014

Sorry for the short notice but Simon will be giving a presentation on the following paper today in our regular weekly reading group slot.

Title: Robust 3D Tracking with Descriptor Fields (CVPR'14)

Abstract: We introduce a method that can register challenging images from specular and poorly textured 3D environments, on which previous approaches fail. We assume that a small set of reference images of the environment and a partial 3D model are available. Like previous approaches, we register the input images by aligning them with one of the reference images using the 3D information. However, these approaches typically rely on the pixel intensities for the alignment, which is prone to fail in presence of specularities or in absence of texture. Our main contribution is an efficient novel local descriptor that we use to describe each image location. We show that we can rely on this descriptor in place of the intensities to significantly improve the alignment robustness at a minor increase of the computational cost, and we analyze the reasons behind the success of our descriptor.

Authors:

Alberto Crivellaro

Vincent Lepetit

The paper is available for download here at: http://infoscience.epfl.ch/record/198219/files/DescriptorFields.pdf

Accompanying video: http://infoscience.epfl.ch/record/198219/files/DescriptorFieldsVideo.mp4

Please come and join us today at 4pm, MIL meeting room.

Reading Group Thursday 3rd April, 2014

Tomorrow, Je Hyeong Hong will be leading the reading group and presenting the following paper on Wiberg Optimisation (that is different to commonly used alternation based optimisation minimising more than one variables.)

Title: Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms

Abstract: This paper examines numerical algorithms for factorization of a low-rank matrix with missing components. We first propose a new method that incorporates a damping factor into the Wiberg method to solve the problem. The new method is characterized by the way it constrains the ambiguity of the matrix factorization, which helps improve both the global convergence ability and the local convergence speed. We then present experimental comparisons with the latest methods used to solve the problem. No comprehensive comparison of the methods that have been proposed recently has yet been reported in literature. In our experiments, we prioritize the assessment of the global convergence performance of each method, that is, how often and how fast the method can reach the global optimum starting from random initial values. Our conclusion is that top performance is achieved by a group of methods based on Newton-family minimization with damping factor that reduce the problem by eliminating either of the two factored matrices. Our method, which belongs to this group, consistently shows a 100% global convergence rate for different types of affine structure from motion data with a very high population of missing components.

Authors:

Takayuki Okatani

Takahiro Yoshida

Koichiro Deguchi

The paper is available for download here at: http://www.vision.is.tohoku.ac.jp/files/2113/7171/7526/damped_wiberg_iccv_okatani.pdf

Please come and join us tomorrow at 4pm.

Reading Group Thursday 27th March, 2014

Tomorrow we have Weixun presenting the BING features in our reading group.

Title: BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Abstract: Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well defined closed boundary, share surprisingly strong correlation in normed gradients space, when resizing their corresponding image windows into a small fixed size. Based on this observation and computational reasons, we propose to resize an image window to 8 × 8 and use the normed gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD , BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals. With increase of the numbers of proposals and colour spaces for computing BING features, our performance can be further improved to 99.5% DR.

Authors:

Ming-Ming Cheng,

Ziming Zhang,

Wen-Yan Lin,

Philip Torr

Appeared in CVPR 2014

Download Link: http://mmcheng.net/mftp/Papers/ObjectnessBING.pdf

Time: 3-4 pm

Room: MIL Meeting room (next to Selene Printer)

Date: 27th Mar, 2014

Presenter: Weixun Goh

Seminar Friday 21st Mar, 2014 by Niloy Mitra

Title: Analysing and Abstracting Scans of Man-made Environments

Abstract: Rapid advances in scanning technologies have resulted in fast and easy acquisition of man-made environments. While such data (e.g., SfM, LiDAR, depth scans) can

come in massive volumes, they do not, in their raw form, provide useful understanding of the environments. Such data provide a unique opportunity to discover and understand variability

in shapes, both in terms of their geometry and arrangements. Our group has been investigating computational strategies to perform such analysis on raw scans to better understand the

form and function of the world around us. In this talk I will present our latest attempts in this direction, while focusing on the underlying methodology and discussing the current

challenges.

More info: http://talks.cam.ac.uk/talk/index/51482

Reading Group Thursday 20th Mar, 2014

Our next reading group will be led by Ujwal and he is presenting the following paper

Title: Occlusion Patterns for Object Class Detection

Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise – instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge

Authors:

Bojan Pepik,

Michael Stark,

Peter Gehler,

Bernt Schiele

Appeared in CVPR 2013

Download Link: http://www.robots.ox.ac.uk/~vgg/rg/papers/Pepikj_Occlusion_Patterns_for_2013_CVPR_paper.pdf

Time: 3-4 pm

Room: MIL Meeting room (next to Selene Printer)

Date: 20th Mar, 2014

Presenter: Ujwal Bonde

Reading Group Thursday 13th Mar, 2014
- Abstract: Recognizing 3D objects from arbitrary view points is one of the most fundamental problems in computer vision. A major challenge lies in the transition between the 3D
- geometry of objects and 2D representations that can be robustly matched to natural images. Most approaches thus rely on 2D natural images either as the sole source of training data for
- building an implicit 3D representation, or by enriching 3D models with natural image features. In this paper, we go back to the ideas from the early days of computer vision by using 3D object models as the only source of information for building a multi-view object class detector. In particular, we use these models for learning 2D shape that can be robustly matched to 2D natural images.Our experiments conﬁrm the validity of our approach which outperforms current state-of-the-art techniques on a multi-view detection dataset.
- Authors:
- Michael Stark
- Michael Goesele
- Bernt Schiele
- Appeared in BMVC 2010
- Download link: http://www.gris.informatik.tu-darmstadt.de/~mgoesele/download/Stark-2010-BTF.pdf
- Time:4-5 pm
  - Room: MIL Meeting room (next to Selene Printer)
  - Date: 13th Mar, 2014,
Presenter: Ankur Handa

Reading Group Thursday 27th Feb, 2014
- Abstract: There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
- Authors:
- Honglak Lee
- Roger Grosse
- Rajesh Ranganath
- Andrew Y. Ng
- Appeared in ICML 2009
- Download link: http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
- Conference presentation: http://videolectures.net/icml09_lee_cdb/
- Time:4-5pm
Presenter: Jonathan Parker
Reading Group Thursday 20th Feb, 2014

Matt's reading notes: https://www.writelatex.com/read/rpkcrgrtncyj

Wiskott's scholarpedia page on Slow Feature Analysis: http://www.scholarpedia.org/article/Slow_feature_analysis

Matt will be presenting two papers on slow feature analysis, or SFA, an unsupervised method for learning useful features from time-series data like video.

The "derivation paper" introduces the SFA algorithm. I'll only be presenting sections 2 & 3, so no need to read the other sections unless interested.

The "pose paper" uses SFA to train a machine to recognize object identity and pose from a simple synthetic video dataset.

"Derivation paper":

Title: Slow Feature Analysis: Unsupervised Learning of Invariances

Derives SFA in sections 2 & 3. You only need to read those sections.

Abstract: Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.

Authors:

Laurenz Wiskott

Terrence Sejnowski

Published in Neural Computation, 2002

"Pose paper":

Title: Invariant Object Recognition with Slow Feature Analysis

Abstract: Primates are very good at recognizing objects independently of viewing angle or retinal position and outperform existing computer vision systems by far. But invariant object recognition is only one prerequisite for successful interaction with the environment. An animal also needs to assess an object’s position and relative rotational angle. We propose here a model that is able to extract object identity, position, and rotation angles, where each code is independent of all others. We demonstrate the model behavior on complex three-dimensional objects under translation and in-depth rotation on homogeneous backgrounds. A similar model has previously been shown to extract hippocampal spatial codes from quasi-natural videos. The rigorous mathematical analysis of this earlier application carries over to the scenario of invariant object recognition.

Authors:

Matthias Franzius

Niko Wilbert

Laurenz Wiskott

Published in ICANN, 2008

Reading Group Thursday 13th Feb, 2014
- Abstract: We present a novel approach for incorporating collision avoidance into trajectory optimization as a method of solving robotic motion planning problems. At the core of our approach are (i) A sequential convex optimization procedure, which penalizes collisions with a hinge loss and increases the penalty coefﬁcients in an outer loop as necessary. (ii) An efﬁcient formulation of the no-collisions constraint that directly considers continuous-time safety and enables the algorithm to reliably solve motion planning problems, including problems involving thin and complex obstacles.
- We benchmarked our algorithm against several other motion planning algorithms, solving a suite of 7-degrees of freedom arm-planning problems and 18-DOF full-body planning problems. We compared against sampling-based planners from OMPL, and we also compared to CHOMP, a leading approach for trajectory optimization. Our algorithm was faster than the alternatives, solved more problems, and yielded higher quality paths.
- Experimental evaluation on the following additional problem types also conﬁrmed the speed and effectiveness of our approach: (i) Planning foot placements with 34 degrees of freedom (28 joints + 6 DOF pose) of the Atlas humanoid robot as it maintains static stability and has to negotiate environmental constraints. (ii) Industrial box picking. (iii) Real-world motion planning for the PR2 that requires considering all degrees of freedom at the same time.
- Authors:
- John Schulman
- Jonathan Ho
- Alex Lee
- Ibrahim Awwal,
- Henry Bradlow and Pieter Abbeel http://www.cs.berkeley.edu/~pabbeel/
- Accepted in RSS 2013
- Supplementary material :http://rll.berkeley.edu/trajopt/rss/
  - Bullet for Collision checking:
- http://bulletphysics.org/wordpress/
- OpenRAVE for environment representation:
http://sourceforge.net/projects/openrave/
- Time:3-4pm
Presenter: Ankur Handa
Reading Group Thursday 6th Feb, 2014

Our next reading group will be held on 6th of February in the MIL meeting room. I will be presenting the following paper on shape anchors based multiview reconstruction.

Title: Shape Anchors for Data-driven Multi-view Reconstruction

Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition.We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry of parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.

Download Link: http://people.csail.mit.edu/torralba/publications/anchors_iccv2013.pdf

Authors:

Andrew Owen: http://andrewowens.org/

Jianxiong Xiao: http://vision.princeton.edu/people/xj/

Antonio Torralba: http://web.mit.edu/torralba/www/

William Freeman: http://people.csail.mit.edu/billf/

Time: 3-4pm

Room: MIL Meeting room (next to Selene Printer)

Date: 6th Feb, 2014

Presenter: Ankur Handa

Reading Group Friday 24th Jan, 2014

Our first round of reading groups will start this Friday the 24th and I will present the following paper on Depth extraction.

Title: Depth Extraction from Video Using Non-parametric Sampling

Abstract: We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large dataset containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

Download Link: http://www.kevinkarsch.com/depthtransfer/eccv12-depthtransfer.pdf

Authors:

Kevin Karsch : http://www.kevinkarsch.com/

Ce Liu: http://people.csail.mit.edu/celiu/

Sing Bing Kang: http://research.microsoft.com/en-us/people/sbkang/

Additional material: A webpage is set up here for more information http://www.kevinkarsch.com/depthtransfer/

Time: 4-5pm

Room: MIL Meeting room (next to Selene Printer)

Date: 24th Jan

Presenter: Ankur Handa