Local Classifiers-based Contour Tracking with Superpixels for Rotoscoping
Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. A method for salient region detection exploits features of color and luminance [1],and finds the Euclidean distance between the Lab pixel vector in a Gaussian filtered image with the average Lab vector for the input image.
Fig. 1. Saliency Map Estimation
GrabCut [2] is an iterative image segmentation technique based upon the Graph Cut algorithm. First, user creates an initial trimap by selecting a rectangle. Pixels inside the rectangle are marked as unknown. Pixels outside of rectangle are marked as known background. Second, Computer creates an initial image segmentation, where all unknown pixels are tentatively placed in the foreground class and all known background pixels are placed in the background class. Next, Gaussian Mixture Models (GMMs) are created for initial foreground and background classes. Each pixel in the foreground class is assigned to the most likely Gaussian component in the foreground GMM. Similarly, each pixel in the background is assigned to the most likely background Gaussian component. Then, the GMMs are thrown away and new GMMs are learned from the pixel sets created in the previous set. A graph is built and Graph Cut is run to find a new tentative foreground and background classification of pixels.
Fig. 2. Grabcut-based interactive segmentation
Given the initial mask on a keyframe, uniformly sample a set of overlapping windows along its contour. The size of the windows can vary according to the size of the object, and it is usually 30x30 to 80x80 pixels. Each window defines the application range of a local classifier, and the classifier will assign to every pixel inside the window a foreground (object) probability, based on the local statistics it gathers. Neighboring windows overlap for about 1/3rd of the window size. Each classifier inside the window consists of a local color model Mc, a color model confidence value fc, and a local shape model Ms [4]. Build Gaussian Mixture Models (GMMs) for the local foreground (F) and background (B) regions, in the Lab color space. To avoid possible sampling errors, only use pixels whose spatial distance to the segmented boundary is larger than a threshold (5 pixels for instance) as the training data for the GMMs. The number of components in each GMM is set to 3. The local color model confidence fc is used to describe how separable the local foreground is against the local background using just the color model. It applies the constructed color models conservatively [4]. The local shape model Ms contains the existing segmentation mask, and a shape confidence mask. Once initialized, the classifiers, along with their associated windows,propagate onto the next frame according to motion estimations. The color and shape models in each classifiers are then updated, and integrated according to new local image statistics. Finally the outputs of all the local classifiers are aggregated to generate the segmentation for the new frame, as shown in Figure 3(d),(e).
Shown in Fig. 3 [4]: (a) Overlapping classifiers (yellow squares) are initialized along the object boundary (red curve) on frame t. (b) These classifiers are then propagated onto the next frame by motion estimation. (c) Each classifier contains a local color model and shape model, they are initialized on frame t and updated on frame t+1. (d) Local classification results are then combined to generate a global foreground probability map. (e) The final segmented foreground object on frame t + 1.
Fig. 3. GMM-based local classifier building in contour tracking initialization (updating in tracking)
The goal of window propagation is to move local windows along with the moving object so that its boundary on the next frame is still fully covered by these windows, although not yet accurately. To achieve this, rough motion estimation is necessary [4]. An initial shape alignment is applied usually to captures and compensates for large rigid motions of the foreground object, by global motion (affine) motion estimation with SIFT feature matching.Second,optical flow is estimated to capture the local deformations of the object. Since only interested in flow vectors for pixels inside the object and optical flow as unreliable, especially near boundaries where occlusions occur, use a local flow averaging approach for more robust contour prediction, shown in Fig. 4 [4]: (a) A window center x0 is moving along the regional average optical flow vx inside the contour. (b) Pixel-wise optical flow (color represents flow direction (see (c)) and intensity represents magnitude.). (c) Average flow inside the contour, which is locally smooth while the deformation of the object is maintained.
Fig. 4. optic flow smoothing in contour prediction
In contour initialization, manually clicking your mouse around the object to generate a rough contour and then applying GrabCut to generate the refined object boundary. Static segmentation (watershed) can generate many segments (called "super-pixels" [3]) around the boundary which is used for contour propagation.
Fig. 5. Contour initialization
Here is the technical point for contour propagation with super-pixels. Grouping the pixel-based classification results by local classifiers and calculating the final FG/BG detection for each super-pixel and then refine the contour and color/shape model correspondingly. (Note: here over segmentation is realized on purpose)
Fig. 6. Super pixel for contour refinement
1. R. Achanta, S. Hemami, F. Estrada and S. Süsstrunk, Frequency-tuned Salient Region Detection, IEEE CVPR, 2009.
2. C Rother, V Kolmogorov, A Blake, GrabCut — Interactive Foreground Extraction using Iterated Graph Cuts, ACM SIGGRAPH 2004.
3. X. Ren and J. Malik. Learning a classification model for segmentation, In Proc. 9th Int. Conf. Computer Vision, volume 1, pages 10-17, 2003.
4. X Bai, J Wang, D Simons, G Saprio,Video SnapCut: Robust Video Object Cutout Using Localized Classifiers, ACM SIGGRAPH 2009.