Special Topic on Image Retrieval Measure Image Similarity by Local Feature Matching • Matching criteria (matlab demo) – Distance criterion: Given a test feature from one image, the normalized L2-distance from the nearest neighbor of a comparison image is less than ϵ. – Distance ratio criterion: Given a test feature from one image, the distance ratio between the nearest and second nearest neighbors of a comparison image is less than 0.80. 50 100 150 200 250 300 350 400 450 SIFT Matching by Threshold The distribution of identified true matches and false matches based on L2-distance thresholds. Coefficient distributions of the top 20 dimensions in SIFT after PCA Direct matching: the complexity issue • Assume an image described by m=1000 descriptors (dimension d=128) – N*m=1 billion descriptors to index • Database representation in RAM – 128 GB with 1 byte per dimension • Search: m2* N * d elementary operations – i.e., > 1014 " computationally not tractable – The quadratic term m2: severely impacts the efficiency Bag of Visual Words • Text Words in Information Retrieval (IR) – – Compactness Descriptiveness Bag-of-Word model Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. sensory, brain, Our perception of the world around visual, perception, us is based essentially on the retinal, cortex, messages thatcerebral reach the brain from eye, cell, optical our eyes. For a long time it was nerve, imageimage was thought that the retinal Wiesel transmittedHubel, point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Retrieve China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. China,Ministry trade, said the The Commerce surplus, surplus would commerce, be created by a exports, imports, predicted 30% jump in US, exports to yuan, bank, domestic, $750bn, compared with a 18% rise in increase, importsforeign, to $660bn. The figures are trade, value likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Bag of Visual Words • Could images be represented as Bag-of-Visual Words? Bag of ‘visual words’ Image ? CBIR based on BoVW Query Feature Extraction Vector Quantization Index Lookup On-line …… Retrieval Results Database Feature Extraction Vector Quantization Image Index Off-line Codebook Training 8 Bag-of-visual-words • The BOV representation – First introduced for texture classification [Malik’99] • “Video-Google paper” – Sivic and Zisserman, ICCV’2003 – Mimic a text retrieval system for image/video retrieval – High retrieval efficiency and excellent recognition performance • Key idea: n local descriptor describing the image -> 1 vector – sparse vectors " efficient comparison – inherits invariance of the local descriptors • Problem: How to generate the visual word? Bag-of-visual words • The goal: “put the images into words”, namely visual words – Input local descriptors are continuous – Need to define what a “visual word is” – Done by a quantizer q – q is typically a k-means • is called a “visual dictionary”, of size k – A local descriptor is assigned to its nearest neighbor – Quantization is lossy: we can not get back to the original descriptor – But much more compact: typically 2-4 bytes/descriptor Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization CSH K-means Clustering • Given a dataset x1 ,, x n • Goal: Partition the dataset into K clusters denoted by μ1 ,, μ n • Formulation: arg min J μ1 ,,μ n r11,,rnk N K n 1 k 1 • Solution: – Fix μ k , and solve rnk – rnk x n μ k 2 Fix rnk , and solve μ k rnk : assignment of a data to a cluster 1 if k arg min j x n μ j r : nk otherwise 0 : μk r x r n nk n n nk – Iterate above two steps until convergence General Steps of K-means Algorithm • General Steps: – 1. Decide on a value for k. – 2. Initialize the k cluster centers (randomly, if necessary). – 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. – 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. – 5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3. Illustration of the K-means algorithm using the re-scaled Old Faithful data set. Plot of the cost function. The algorithm has converged after the third iterations Large Vocabularies with Learned Quantizer • Feature matching by quantization: • Hierarchical k-means [Nister 06] • Approximate k-means [Philbin 07] – Based on approximate nearest neighbor search – With parallel k-tree Bag of Visual Words with Vocabulary Tree Interest point detection and local feature extraction Hierarchical Kmeans clustering Visual Word Get the final visual word Tree Feature Space Bag of Visual Words • Summarize entire image based on its distribution (histogram) of word occurrences. frequency Visual Word Histogram ….. Visual words codebook Bag of Visual Words with Vocabulary Tree IDF: inverse document frequency TF: term frequency nid : number of occurrences of word i in image d nd : total number of words in the image d ni : the number of occurrences of term i in the whole database N : the number of images in the whole database Visual Word Histogram Bag of Visual Words Interest point detection and local feature extraction Bag of Visual Words Features are clustered to quantize the space into a discrete number of visual words. Bag of Visual Words Given a new image, the neareast visual wod is identified for each of its features. Bag of Visual Words A bag-of-visual-words histogram can be used to summarized the entire image. Image Indexing and Comparison based on Visual Word Histogram Representation • Image Indexing with visual word – Representation of a sparse image-visual word matrix – Store only those non-empty items • Image distance measurement Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization CSH KD-Tree (K-dimensional tree) • Divide the feature space with binary tree • Each non-leaf node corresponds to a cutting plane • Feature points in each rectilinear space are similar to each other 10 8 6 4 2 0 0 2 4 6 8 10 KD-Tree • Binary tree with each node denoting a rectilinear region • Each non-leaf node corresponds to a cutting plane • The directions of the cutting planes alternate with depth An Example Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization CSH Locality-Sensitive Hashing [Indyk, and Motwani 1998] [Datar et al. 2004] 101 0 1 Index by compact code hash function h(x) sgn(wT x b) 0 1 1 random 0 • hash code collision probability proportional to original similarity l: # hash tables, K: hash bits per table Courtesy: Shih-Fu Chang, 2012 Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization CSH Product Quantization (TPAMI’12) • Basic idea – Partition feature vector into m sub-vectors – Quantize sub-vectors with m distinct quantizers • Advantage – Low complexity in quantization – Extremely fine quantization of the feature space Distance table • Vector distance approximation D m (x y ) 2 i i 1 i k 1 || u k (x) u k (y ) ||22 where ak qk u k (x) , bk qk u k (y ) m || ca c k k 1 bk ||22 m d k 1 d12 … d1K … || x y ||22 d11 ak bk dK1 dKK Product Quantization Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization CSH Scalar Quantization • SIFT distance distribution of true matches and false matches from experimental study Novel Approach- Scalar Quantization • Basic Idea – Scalar vs. Vector Quantization simple, fast, data independent – Map a SIFT feature to a binary signature (bits) • Map function is independent of image collection – The binary signature keeps the discriminative power of SIFT feature L2 distance Hamming distance Binary SIFT Signature • General idea Distance preserving SIFT descriptor Binary Signature (0, 25, 8, 2, . . ., 14, 5, 2)T How to select f(x) Transformation f(x) (0, 1, 0, 0, . . ., 1, 0, 0) T • Compact for storing in memory • Efficient for comparison computing Preferred Properties • Simple and efficient • Unsupervised to avoid overfiting to training data • Well preserved feature distance Scalar Quantization (MM’12) • Each dimension is encoded with one bit – Given a feature vector f ( f1 , f 2 ,, f d )T R d – Quantize it to a bit vector b (b1, b2 , , bd )T 1 if f i fˆ bi 0 if f i fˆ (i 1,2,, d ) fˆ : the median value of vector f Experimental Observation Statistical study on 0.4 trillion feature pairs 0.2 1.4 average standard deviation average L2 distane 1.2 1 0.8 0.6 0.4 0.15 0.1 0.05 0.2 0 0 10 20 30 40 50 60 Hamming distance 70 80 90 0 100 (a) Descriptor pair frequency vs. Hamming distance; 10 frequency of SIFT descriptor pair 2 x 10 0 10 20 30 40 50 60 Hamming distance 70 80 (c) Descriptor pair frequency vs. Hamming distance; 0.5 0 0 20 40 60 Hamming distance 80 100 100 (b) The average standard deviation vs. Hamming distance. 1.5 1 90 An Example of Matched Features DF xor distance : 6 (f2D: 5, D: 6); L2 dis: 0.03 150 L2 distance : 0.03, Hamming distance : 4 magnitude 20 40 60 80 100 50 100 120 0 140 50 100 150 200 250 Observation 60 80 feature dimension 100 120 20 40 60 80 feature dimension 100 120 20 40 60 80 feature dimension 100 120 100 50 0 Implication: 1 XOR result The differences between bin magnitudes and a predefined threshold are stable for most bins. 40 150 magnitude Share some common patterns in magnitudes on the 128 bins, e.g., the pairwise differences between most of bins are similar and stable. 20 300 0.5 0 Distribution of SIFT Median Value • Distribution on 100 million SIFT features 6 7 x 10 frequency of SIFT feature 6 5 4 3 2 1 0 0 10 20 30 40 50 median value 60 70 80 Scalar Quantization • Generalize Scalar Quantization – Encode each dimension with multi-bits, e.g., 2 bits – Trade-off between memory and accuracy 150 (1, 1) (0, 0) (1, 0) magnitude 100 50 0 f2 0 20 f1 40 60 80 100 SIFT descriptor bin (sorted by magnitude) 120 A typical SIFT descriptor with bin magnitude sorted in descending order Scalar Quantization • Each dimension is encoded with one bit – In practice, we quantize each dimension with 2 bits • Considering memory and accuracy (1, 1) if f i bi , bi 128 (1, 0) if fˆ1 f i (0, 0) if f i fˆ2 fˆ2 fˆ1 (i 1,2,, d ) g 64 g 65 ˆ g 32 g 33 ˆ , f2 where f1 2 2 ( g1, g 2 ,, g128) is descendingly sorted from ( f1, f 2 ,, f128) Visual Matching by Scalar Quantization • Given SIFT f (1) from Image Iq and f (2) from image Id • Perform scalar quantization: f (1) b(1) ; f (2) b(2) • f (1) matches f (2), if Hamming distance d(b(1), b(2)) < Threshold DF xor distance : 22 (D: 14); L2 dis: 0.10 Real example: 256-bit SIFT binary signature Threshold = 24 bits Binary SIFT Signature – Given a SIFT descriptor f ( f1, f 2 ,, f d )T R d – Transform it to a bit vector b (b1, b2 ,, bk d )T • Each dimension is encoded with k bits, k ≤ log2d Example: d=8 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 Outline • Motivation • Our Approach – Scalar Quantization –Index structure – Code Word Expansion • Experiments • Conclusions • Demo Indexing with Scalar Quantization • Use inverted file indexing • Take the top t bits of the SIFT binary signature as “code word”. • Store the remaining bits in the entry of indexed features • A toy example : Code Word ID CW (100) 10010101 Indexed feature list for image database Indexed Features Image ID …… (10101) Unique “Code Word” Number • Code word by 32 bit -> 232 in total ideally 8 10 amount of unique code words 7 10 6 10 5 10 4 10 16 18 20 22 24 bit number t 26 28 30 32 Stop Frequent Code Word 5 2 x 10 1000 visual word frequency code word frequency 1.5 1 800 600 400 0.5 200 0 0 10 2 4 6 10 10 10 code word rank (sorted by frequency) 8 10 0 0 10 2 4 6 10 10 10 visual word rank (sorted by frequency) (a) (b) Figure . Frequency of code words among one million images (a) before, and (b) after, application of a stop list. 8 10 Code Word Expansion • Scalar Quantization Error – Flipping bits exist in code word 10010101 10110101 – If ignore those flipped bits, many candidate features will be missed • Degrade recall !! – Solution: Expand code word to include flipped code words • Enumerate all possible nearest neighbors within a predefined Hamming distance Code Word Expansion: Quantization Error Reduction Visual Word ID 2-bit flipping Indexed feature list for image database VW0 (000) …… VW1 (001) …… VW2 (010) …… VW3 (011) …… VW4 (100) …… VW5 (101) …… VW6 (110) …… VW7 (111) …… Analysis on Recall of Valid Features 01001…001010…1010111…110101…110001…101110 224 bits for in index list 32 bits for code word Retrieved features as candidates: All candidate features : S {HamDis 32 2} {HamDis 256 24} S recall 39 .8% Popular Quantization Schemes • • • • • • K-means K-d tree LSH Product quantization Scalar quantization Cascaded Scalable Hashing Cascaded Scalable Hashing (TMM’13) SIFT: (12, 0, 1, 3, 45, 76, ……, 9, 21, 3, 1, 1, 0) Keep high precision PCA Keep high recall Binary Signature Generation Hashing Hashing >80% false positive >80% false positive … Hashing >80% false positive Binary Signature Verification SCH: Problem Formulation • Nearest Neighbor (NN) by distance criterion • Relax NN to approximate nearest neighbor: • How to determine the threshold ti in each dimension? – pi(x): probability density function in dimension i – r i: relative recall rate in dimension i, as SCH: Problem Formulation (II) • Relative recall rate in dimension i: • Total recall rate by cascading c dimensions: • To ensure high total recall, impose the constraint on the recall rate of each dimensions: • Then total recall: Hashing in Single Dimension • Feature matching criteria: 0.05 1 0.04 0.8 accumulated probability probability density Distance criterion: Given a test feature from one image, the normalized L2distance from the nearest neighbor of a comparison image is less than ϵ. Distance ratio criterion: Given a test feature from one image, the distance ratio between the nearest and second nearest neighbors of a comparison image is less than 0.80. 0.03 0.02 0.01 0 0 20 40 60 80 absolute coefficient difference 100 0.6 0.4 0.2 0 0 20 40 60 80 absolute coefficient difference 100 Hashing in Single Dimension • Scalar quantization/hashing in each dimension: • Cascaded hashing across c dimensions: – The SCH result can be further represented as a scalar key – Each indexed feature is hashed to only one key • Online query hashing: – Each query feature is hashed to at most 2c keys Binary Signature Generation • For those bins in PSIFT after the top c dimensions, • Feature matching verification – Checking the hamming distance between binary signatures