CVPR 2014 Tutorial

Dense Image Correspondences for Computer Vision

1Microsoft Research    2Amazon    3UCSD

8:30am~5pm, June 23, 2014

C210 Columbus Convention Center, Columbus Ohio

Correspondence, namely how pixels in one image correspond to pixels in another image, is a fundamental problem in computer vision. Although correspondence has been mostly used for analyzing transformations between images from one scene, a new era has started recently when correspondence can be established across different scenes. In this tutorial, we will give an overview of dense correspondence algorithms for aligning images from different scenes. We will survey a variety of representations, including pixels (SIFT flow, Non-Rigid Dense Correspondence), semantic segments (layer flow) and image pyramid (deformable spatial pyramid). These dense alignment algorithms are powerful tools to analyze images and videos. We can not only transform information such as semantic labels, image details and geometry from images and videos in a labeled dataset, but also analyze an entire image database as a whole via information propagation. Recent advances on scene parsing, 2D video to 3D, annotation propagation (image to text), object discovery, co-segmentation, image hallucination, and biomedical image analysis demonstrate that across-scene correspondence can be a fundamental building block for computer vision.

** Notice some last-minute changes to the schedule because of delayed flights **
 Time  Speaker  Title
 8:30 ~ 8:40am TBA Introduction
 8:40 ~ 9:10am Jaechul Kim Deformable Spatial Pyramid Matching for Fast Dense Correspondences
 9:10 ~ 9:30am  Joseph Lim  Analysis of Semantic Visual Correspondences
 9:30 ~ 10:00am  Philip Isola  Scene Collaging: Dense Correspondence for Scenes with Layers
 10:00 ~ 10:30am  Break  
 10:30 ~ 11:00am  Marshall Tappen  Using Dense Image Matching to Capture Context for Super-Resolution
 11:00 ~ 11:30am  Zhuowen Tu  Scale-space SIFT flow and SIFT flow for 3D medical image registration
 11:30 ~ 12:00pm  Tal Hassner  Scale-less Dense Correspondences
 12:00 ~ 1:30pm  Lunch break  
 1:30 ~ 2:20pm  Ce Liu  Dense Image Correspondences
 2:20 ~ 3:00pm  Michael Rubinstein  Joint Inference in Image Databases via Dense Correspondence
 3:00 ~ 3:30pm  Break  
 3:30 ~ 4:00pm  Eyal Ofek  Fast Approximate-Nearest-Neighbors-Field in 3D
 4:00 ~ 4:30pm  Ce Liu  Depth-transfer for 3D Videos
 4:30 ~ 5:00pm  Ce Liu  Summary and Future Work

Talk Abstracts

Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Jaechul Kim


8:40 ~ 09:10am

Abstract. One of the challenge in dense pixel matching is its computation cost. In this talk, we introduce recent algorithms for fast dense correspondences. The talk will focus on a hierarchical graphical model and its optimization for fast matching [1] with a brief introduction to today's most popular approaches, including PatchMatch [2] and SIFTFlow [3].

[1] Deformable Spatial Pyramid Matching for Fast Dense Correspondences.
J. Kim, C. Liu, F. Sha, and K. Grauman. CVPR 2013.
[2] The Generalized PatchMatch Correspondence Algorithm
C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein, ECCV 2010.
[3] SIFT flow: dense correspondence across different scenes and its applications.
C. Liu, J. Yuen and A. Torralba. TPAMI, 2011

Analysis of Semantic Visual Correspondences

Joseph Lim


9:10 ~ 9:30am

AbstractAlthough recent advances in semantic correspondences for computer vision have shown promising results, little has been done to understand semantic visual correspondences by humans, and how image correspondence algorithms perform compared to each other and compared to humans. In this talk, I will share my recent work on constructing a human-labeled image correspondence dataset for analyzing and benchmarking visual correspondence. The dataset covers a variety of classes and images, with correspondences between every pair of images given by multiple human viewers. I will then talk about our analysis of the human correspondences, and will show how recently developed dense image correspondence algorithms, such as Sift-Flow and NRDC, perform on predicting various correspondence-related tasks.

Scene Collaging: Dense Correspondence for Scenes with Layers

Philip Isola


9:30 ~ 10:00am

AbstractTo quickly synthesize complex scenes, digital artists often collage together visual elements from multiple sources: for example, mountains from New Zealand behind a Scottish castle with wisps of Saharan sand in front. Here, we propose to use a similar process in order to parse a scene. We model a scene as a collage of warped, layered objects sampled from labeled, reference images. Each object is related to the rest by a set of support constraints. Scene parsing is achieved through analysis-by-synthesis. Starting with a dataset of labeled exemplar scenes, we retrieve a dictionary of candidate object segments that match a query image. We then combine elements of this set into a "scene collage" that explains the query image. Beyond just assigning object labels to pixels, our method produces much more information such as the number of each type of object in the scene, how they support one another, the ordinal depth of each object, and, to some degree, occluded content. We exploit this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.

Break  10:00 ~ 10:30am

Using Dense Image Matching to Capture Context for Super-Resolution

Marshall Tappen



Abstract. For the last decade, super-resolution systems have been dominated by systems where the core step of the system tries to estimate a high-resolution patch from a small low-resolution patch.  This limits the ability of the system to hallucinate detail when enlarging low-resolution images.  In this talk, I will show how dense image matching can be used to provide to provide the context needed to hallucinate more detail than can be seen in small image patch.

Scale-space SIFT flow and SIFT flow for 3D medical image registration

Zhuowen Tu


11:00 ~ 11:30am

AbstractIn the first part of this part of tutorial, a scale-space SIFT flow will be discussed. We discuss a simple, intuitive, and very effective approach, Scale-Space SIFT flow, to deal with the large scale differences in different image locations. We introduce a scale field to the SIFT flow function to automatically explore the scale deformations. This approach achieves similar performance as the SIFT flow method on general natural scenes but obtains significant improvement on the images with large scale differences.  In the second part, applications of SIFT flow to 3D medical image registration will be presented. Building dense correspondence between 3D medical images has shown to produce state-of-the-art results and become increasingly popular in medical imaging.

Dense Correspondences Across Scenes and Scales

Tal Hassner

The Open University of Israel

11:30 ~ 12:00pm

AbstractDense correspondences between images are rapidly becoming key enabling capabilities in much more than depth from stereo applications. Alongside this growing trend, efforts are being made to go beyond the brightness-constancy constraint, originally assumed by optical-flow methods, in order to make the estimation process applicable in these challenging, non-standard scenarios. This tutorial will focus on the thread of work which attempts to allow dense matching of pixels in the presence of (possibly extreme) scale differences. The tutorial will quickly survey some of the non-traditional applications for dense correspondences. It will then detail methods for dense, scale-invariant, pixel representations designed to be used with all the pixels in the image -- not only where reliable scales can be determined -- thus allowing dense matches to be formed between images which contain different scale variations in different image regions.

Lunch 12:00 ~ 1:30pm

Dense Image Correspondences: Introduction

Ce Liu

Microsoft Research

1:30 ~ 2:20pm

Abstract. Working on a video sequence is always easier than working on a single image because information can be propagated from adjacent frames through optical flow. But we often face the challenge of analyzing a single image. We show that the idea of "temporal adjacent frames" and "optical flow" can be generalized to a set of images using global image similarities such as GIST, and dense image correspondences such as SIFT flow. By means of SIFT flow, semantically similar images of different 3D scenes can be densely aligned to allow information such as label, depth, geometry to be transferred from a labeled image database to analyze a query image. In this tutorial, we will introduce the SIFT flow algorithm and its applications in motion transfer, label transfer, and depth transfer.

Joint Inference in Image Databases via Dense Correspondence

Michael Rubinstein

Microsoft Research

2:20 ~ 3:00pm

Abstract. Most example-based approaches to computer vision -- transferring information such as semantic labels, depth, or 3D, from labeled references to an unlabeled query -- rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often noisy or unavailable. I will show how we can still utilize dense correspondences to infer properties of images and pixels even when very few (or none) pixel labels are given. I will demonstrate that on two computer vision problems. The first is semantic segmentation in image databases that contain only sparse image-level tags and very few pixel labels. The second is object discovery and co-segmentation, where we seek to automatically segment multiple images containing a common object, with no additional information on the images or the common object class.

Break  3:00 ~ 3:30pm

Fast Approximate-Nearest-Neighbors-Field in 3D

Eyal Ofek

Microsoft Research

3:30 ~ 4:00pm

Abstract. Computing Approximate Nearest Neighbor Fields (ANNF) is an important building block in many computer vision and graphics applications such as texture synthesis, image editing and image denoising. This is a challenging task because the number of patches in an image is in the millions. Coherency Sensitive Hashing (CSH) combines benefits from prior methods such as Locality Sensitivity Hashing (LSH) and PatchMatch to quickly find matching patches between two images. CSH is at least three to four times faster than PatchMatch and more accurate, especially in textured regions. I will present CHS, as well as extending it to work on patches in 3D space, using RDBD images.  The result is DCSH - an algorithm that matches world (3D) patches in order to guide the search for image plane matches. Another extension of CSH that will be presented, Social-CSH, allows a major speedup of CSH. I will show the benefits of using depth information to image reconstruction and image denoising.

Depth Transfer for 3D Video

Ce Liu

Microsoft Research

4:00 ~ 4:30pm

AbstractWe describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large dataset containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

Summary and Future Work

Ce Liu

Microsoft Research

4:30 ~ 5:00pm

Abstract. To wrap up, I will show several promising directions and open questions in the area of dense correspondences that may hopefully inspire the audiences.