Mpeg CDVS-based Image Indexing and Search
• Descriptor extraction process needed to ensure interoperability.
• Bitstream of compact descriptors.
• Considered system architectures
• Server mainly performs indexing and query/retrieval, Clients (mobile devices, desktop, etc.) work differently:
• (a) Client only capture images and displays the retrieval results;
• (b) Client can extract image descriptor and also displays the retrieval results;
• (c) Client takes more tasks, like local DB/cache-based image matching, also displays retrieval results.
CDVS's Scope
•CE1: global descriptor: SCFV;
•Residual Enhanced Visual Vector;
•Robust Visual Descriptor (RVD);
•CE2: local descriptor compression: CHoG;
•Transform +Scalar Quantizer;
•Multi-stage Vector Quantizer;
•CE3: Location coding: context based coordinate coding;
•CE4: key-point detection: ALP;
•Block-based Frequency domain LoG;
•CE5: Local Descriptors: SIFT;
•CE6: retrieval pipeline: MBIT;
•CE7: feature selection: Bayesian learning based feature selection;
•CE8: pairwise matching pipeline: DISTRAT (local matching with geometric verification)
•Weighted Hamming distance (global matching).
•SIFT Patent: DoG filtering to construct scale space!
•D. G. Lowe: ”Distinctive image features from scale-invariant keypoints”, IJCV 2004, Patent No US 6,711,293.
•ALP (low polynomial degree): by Telecom Italia;
•Idea: scale space response modeled by a polynomial function; then estimation of coefficients by LoG filtering at different scales:
•1. Scale space: approx. by a polynomial, get local extrema in the polynomials, elimin. exceeded at the boundaries, output a list of candidates (x, y, σ);
•2. Clean candidates: either at edges with bigger ratio of curvatures, or with lower absolute values or curvatures;
•3. Refinement of coordinates: approximate scale space by a polynomial;
•4. Eliminates duplicates at octave boundaries;
•5. Find the remaining candidates.
•Bag of Visual Words: zero order moments;
•Vocabulary Tree: hierarchical clustering;
•Vector of Locally Aggregated Descriptors (VLAD): 1st order moments;
•Residual Enhanced Visual Vector: similar to VLAD, as residual + LDA;
•Hamming Embedding: BoW plus a binary vector (Hamming distance);
•Fisher Vector: 2nd order moments;
•Compressed Fisher Vector: Fisher vector + Product Quantization;
•PQ: decomposed into Cartesian product of subspace, then quantized separately;
•Spectral Hashing: partition by spectral method (eigenvectors of graph Laplacian).
•Scalable Compressed Fisher Vector: sparsity of FVs;
•Robust Visual Descriptor: similar to SCFV, robust cluster + bit selection.
Feature Extraction in CDVS
Image Pairwise Matching in CDVS
Image Retrieval in CDVS
Demo of CDVS-based Image Indexing and Retrieval
1. Zurich Buildings Image Database
2. U. of Kentucky Objects Image Database
Performance of CDVS-based Image Matching and Database Indexing & Retrieval
CDVS Test Dataset (32k images): graphics (books, cd/dvd, cards, prints), videos, paintings, buildings etc.
Distraction images: 1 million images from Flickr.
A. Without distraction input:
B: With distraction input: