Mpeg CDVS for visual search

Mpeg CDVS-based Image Indexing and Search

• Descriptor extraction process needed to ensure interoperability.

• Bitstream of compact descriptors.

• Considered system architectures

• Server mainly performs indexing and query/retrieval, Clients (mobile devices, desktop, etc.) work differently:

• (a) Client only capture images and displays the retrieval results;

• (b) Client can extract image descriptor and also displays the retrieval results;

• (c) Client takes more tasks, like local DB/cache-based image matching, also displays retrieval results.

CDVS's Scope

•CE1: global descriptor: SCFV;

•Residual Enhanced Visual Vector;

•Robust Visual Descriptor (RVD);

•CE2: local descriptor compression: CHoG;

•Transform +Scalar Quantizer;

•Multi-stage Vector Quantizer;

•CE3: Location coding: context based coordinate coding;

•CE4: key-point detection: ALP;

•Block-based Frequency domain LoG;

•CE5: Local Descriptors: SIFT;

•CE6: retrieval pipeline: MBIT;

•CE7: feature selection: Bayesian learning based feature selection;

•CE8: pairwise matching pipeline: DISTRAT (local matching with geometric verification)

•Weighted Hamming distance (global matching).

•SIFT Patent: DoG filtering to construct scale space!

•D. G. Lowe: ”Distinctive image features from scale-invariant keypoints”, IJCV 2004, Patent No US 6,711,293.

•ALP (low polynomial degree): by Telecom Italia;

•Idea: scale space response modeled by a polynomial function; then estimation of coefficients by LoG filtering at different scales:

•1. Scale space: approx. by a polynomial, get local extrema in the polynomials, elimin. exceeded at the boundaries, output a list of candidates (x, y, σ);

•2. Clean candidates: either at edges with bigger ratio of curvatures, or with lower absolute values or curvatures;

•3. Refinement of coordinates: approximate scale space by a polynomial;

•4. Eliminates duplicates at octave boundaries;

•5. Find the remaining candidates.

•Bag of Visual Words: zero order moments;

•Vocabulary Tree: hierarchical clustering;

•Vector of Locally Aggregated Descriptors (VLAD): 1st order moments;

•Residual Enhanced Visual Vector: similar to VLAD, as residual + LDA;

•Hamming Embedding: BoW plus a binary vector (Hamming distance);

•Fisher Vector: 2nd order moments;

•Compressed Fisher Vector: Fisher vector + Product Quantization;

•PQ: decomposed into Cartesian product of subspace, then quantized separately;

•Spectral Hashing: partition by spectral method (eigenvectors of graph Laplacian).

•Scalable Compressed Fisher Vector: sparsity of FVs;

•Robust Visual Descriptor: similar to SCFV, robust cluster + bit selection.

Feature Extraction in CDVS

Image Pairwise Matching in CDVS

Image Retrieval in CDVS

Demo of CDVS-based Image Indexing and Retrieval

1. Zurich Buildings Image Database

2. U. of Kentucky Objects Image Database

Performance of CDVS-based Image Matching and Database Indexing & Retrieval

CDVS Test Dataset (32k images): graphics (books, cd/dvd, cards, prints), videos, paintings, buildings etc.

Distraction images: 1 million images from Flickr.

A. Without distraction input:

B: With distraction input:

Page updated

Google Sites

Report abuse