Hashing with Graphs g p Wei Liu (Columbia Wei Liu (Columbia Columbia) Jun Wang (IBM Columbia), Jun Wang (IBM IBM) IBM), Sanjiv Kumar (Google Google), and Shih‐Fu Chang (Columbia Columbia) June, 2011 Outline • • • • • Overview Overview Graph Hashing Anchor Graph Hashing E Experiments i t Conclusions Hashing with Graphs, ICML 2011 2 Fast NN Search Fast NN Search • In In many ML/CV/IR problems, one needs a reliable many ML/CV/IR problems one needs a reliable content‐based similarity search engine. • The simplest method is nearest neighbor (NN) search The simplest method is nearest neighbor (NN) search using some similarity measure such as Lp, cosine, kernel. • The exhaustive linear search into n The exhaustive linear search into n data items takes data items takes time. KD‐tree is faster but seriously cursed by y y dimensionality (c<=dd, d<20 works). Hashing with Graphs, ICML 2011 3 Hashing • Low Low memory usage and high search efficiency. memory usage and high search efficiency randomized hash function • Locality‐Sensitive Hashing (LSH) long bits/multi‐tables Indyk et al. (since 1998) et al (since 1998) Semantic Hashing (RBMs) Salakhutdinov & Hinton (2006,2007) & Hinton (2006 2007) Spectral Hashing (SH) learned hash function Weiss et al (2009) Weiss et al. (2009) short bits/single table Binary Reconstruction Embedding (BRE) K li & Darrell (2010) Kulis &D ll (2010) Semi‐Supervised Hashing (SSH) W Wang et al. (2010) t l (2010) Hashing with Graphs, ICML 2011 4 Hashing Using Compact Codes Hashing Using Compact Codes • Learning‐based Learning based methods try to learn compact binary codes methods try to learn compact binary codes whose Hamming distance approaches or preserves the given similarity, e.g., L2, cosine, kernel (local measure). • Achieve constant time search (O(1)) by flipping several bits and looking up the hash table. learning buckets hash function NN list Hashing with Graphs, ICML 2011 5 Our Mission Our Mission • Can Can load millions of data, e.g., images, into memory. load millions of data e g images into memory Very compact codes! • Beat the existing unsupervised hashing methods (LSH, PCAH, Beat the existing unsupervised hashing methods (LSH, PCAH, USPLH, SH, KLSH, SIKH). • Exploit global p g measure on manifolds to learn compact codes. p Use graphs! • Graphs tend to capture the semantic neighborhoods. Beat L2 linear scan! 1.2 1.2 +1 1 0.8 -1 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 -1 with Graphs, ICML 2011 Hashing -0.6 -0.8 -1.5 +1 1 6 -0.6 -1 -0.5 0 0.5 1 1.5 2 2.5 3 -0.8 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 Outline • • • • • Overview Graph Hashing Anchor Graph Hashing E Experiments i t Conclusions Hashing with Graphs, ICML 2011 7 Graph Hashing (GH) Graph Hashing (GH) • A standard graph A standard graph‐based based hashing framework [Weiss et al. 09] hashing framework [Weiss et al 09] drop it by spectral relaxation drop it by spectral relaxation x2 balanced bits balanced bits uncorrelated bits uncorrelated bits x1 A12 A: the affinity matrix of the data graph. L: the graph Laplacian diag(A1)‐A. L: the graph Laplacian diag(A1) A x1 x2 …… xn Hashing with Graphs, ICML 2011 data points 8 Steps Training: 1) Building a sparse matrix A of an exact neighborhood graph (e g k‐NN graph) on n d‐dim data points needs . (e.g., k‐NN graph) on n d‐dim data points needs 2) Computing & binarizing r eigenvectors of the sparse graph Laplacian L needs . needs Test (hashing): 1) Generalizing r Generalizing r eigenvectors to any unseen point and eigenvectors to any unseen point and binarizing needs (interpolations over n entries) . Training 1) is intractable for offline training; Test 1) is infeasible for online hashing! Hashing with Graphs, ICML 2011 9 Spectral Hashing [Weiss et al. 09] Spectral Hashing [Weiss et al 09] • SH SH directly obtains the analytical eigenfunctions directly obtains the analytical eigenfunctions cos(akx+b) cos(akx+b) of 1D graph Laplacians based on the strong assumptions: i) data dimensions are independent to each other; i) data dimensions are independent to each other; ii) at each dimension data is uniformly distributed. • The 1D graph Laplacian The 1D graph Laplacian is derived from a complete graph. is derived from a complete graph. Manifold structure of raw data is almost ignored. • Pseudo eigenfunctions g & pseudo‐linear hash functions. p Hashing with Graphs, ICML 2011 10 Toy Data Toy Data • 12k 2D points: blue color denotes the positive eigenvector entries and red color denotes the negative entries. smooth yet balanced our approach SH GH 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -00.88 -00.88 -0 0.8 8 -1 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1 -2.5 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 04 0.4 04 0.4 04 0.4 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1 1st eigenvector -2 -1.5 -2.5 -2 -1.5 Hashing with Graphs, ICML 2011 -1 -0.5 0 0.5 1 1.5 2 2.5 2nd eigenvector -1 -0.5 0 0.5 1 1.5 2 2.5 11 Outline • • • • • Overview Graph Hashing Anchor Graph Hashing E Experiments i t Conclusions Hashing with Graphs, ICML 2011 12 Idea • In In Training 1), we utilize our proposed large graph Training 1) we utilize our proposed large graph Anchor Graph whose space/time complexities are both linear. W. Liu, W Liu J. He, and S.‐F. Chang J He and S F Chang Large Graph Construction for Scalable Semi‐Supervised Learning ICML 2010 (code released) Hashing with Graphs, ICML 2011 13 Anchor Graph [Liu et al 10] Anchor Graph [Liu et al. 10] • Nonnegative Nonnegative, sparse, and low sparse and low‐rank rank affinity matrices. affinity matrices • Approximate the exact neighborhood graph when the number of anchors is sufficiently large. number of anchors is sufficiently large. 100 anchor points 10‐NN graph Hashing with Graphs, ICML 2011 Anchor Graph 14 Anchors • Introduce Introduce a few anchor a few anchor points (K points (K‐means means clustering centers) clustering centers) and do the data‐to‐anchor similarity computation (nXm). • The data‐to‐anchor similarity matrix • Obtain the data‐to‐data similarity matrix  using Z. Â18>0 x1 anchor points Z12 Z16 Z11 u1 x8 u6 Â14=0 0 u5 Inner product? u2 u3 u4 x4 data points Hashing with Graphs, ICML 2011 15 Low Rank Affinity Matrix Low‐Rank Affinity Matrix • Kernel‐based truncated Z design ( ): g Note Z1=1 such that Z acts as a mapping anchorsdata pp g • The Anchor Graph affinity matrix The Anchor Graph affinity matrix Rank<= m, only computing and saving Z in memory; a doubly stochastic matrix, so the graph Laplacian is I‐  and has the same eigenvectors as those of Â.  Hashing with Graphs, ICML 2011 16 Anchor Graph Hashing (AGH) Anchor Graph Hashing (AGH) • Solve the relaxed GH problem using the Anchor Graph Solve the relaxed GH problem using the Anchor Graph • D Due to low‐rank of Â, its r l k f i eigenvectors Y = ZW (excluding i Y ZW ( l di the trivial 1) can be easily solved via a small eigenvalue decomposition on the mXm matrix . decomposition on the mXm matrix • looks like r “eigenvectors” on m anchors. • The r‐dim Hamming Embeddings are sgn(ZW). Hashing with Graphs, ICML 2011 17 Eigenvector to Eigenfunction Eigenvector to Eigenfunction • Suppose one eigenvector y Suppose one eigenvector y = Zw = Zw w Hashing with Graphs, ICML 2011 18 Theorem Given m anchor points U={u anchor points U={uj} and any sample x, define a } and any sample x define a Given m feature map as follows where =1 iff anchor uj is one of s nearest anchors of x in U according to the distance function D(∙). Then the NystrÖm g () y eigenfunction extended from the Anchor Graph Laplacian eigenvector yk = Zwk is interpolation over m entries i l i i Hashing with Graphs, ICML 2011 19 Steps Training: 1) Building an Anchor Graph (Z) on n d‐dim data points needs (T is the iteration number of K‐means). is the iteration number of K‐means) 2) Computing & binarizing r eigenvectors of the Anchor Graph Laplacian needs . needs . 3) Build the hash table O(n). Test (hashing): Test (hashing): 1) Generalizing r eigenvectors to any unseen point via NystrÖm extension and binarizing extension and binarizing needs . needs 2) Inverse lookup in the hash table O(1). Linear training time & constant hashing time! Hashing with Graphs, ICML 2011 20 Hierarchical Hashing Hierarchical Hashing • The The higher eigenvec/eigenfunc higher eigenvec/eigenfunc corresponding to the higher corresponding to the higher graph Laplacian eigenvalue is of low quality for partitioning. [Shi et al. 00] [Shi et al. 00] • We propose two‐layer hashing to revisit the lower (smoother) eigenfuncs to generate multiple hash bits. Eigenvalues of Anchor Graph Laplacian 1 0.9 reuse 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Hashing with Graphs, ICML 2011 0 0 50 21 100 150 200 250 300 Hierarchical Hashing Hierarchical Hashing balanced partitioning Hashing with Graphs, ICML 2011 22 1 AGH vs 2 AGH 1‐AGH vs. 2‐AGH • One‐Layer One Layer AGH = Binarizing AGH = Binarizing r eigenfuncs into r into r hash bits hash bits hash functions r‐bit code • Two‐Layer AGH = Hierarchically thresholding r/2 eigenfuncs into r hash bits r‐bit code Hashing with Graphs, ICML 2011 23 Outline • • • • • Overview Graph Hashing Anchor Graph Hashing E Experiments i t Conclusions Hashing with Graphs, ICML 2011 24 MNIST • 70k 70k digit images from 10 classes, 1000 queries uniformly digit images from 10 classes 1000 queries uniformly sampled from 10 classes. • Compare 8 unsupervised hashing methods, L Compare 8 unsupervised hashing methods, L2 2 linear scan, linear scan, and Spectral Embedding (SE) L2 linear scan (real‐valued version of 1‐AGH). ) • Measure Hamming radius 2 precision (truly hashing) and Hamming ranking accuracy (brute‐force search) in terms of true neighbors which take the same class label. Hashing with Graphs, ICML 2011 25 MNIST much higher 0.8 drop 0.6 0.4 0.2 0 12 16 24 32 48 64 (b) Precision vs. # anchors LSH PCAH USPLH SH KLSH SIKH 1-AGH 2 AGH 2-AGH Precision off top-5000 n P neighbors Pre ecision with hin Hammin ng radius 2 (a) Precision vs. # bits 1 0.75 2‐AGH>SE L2 > L2 0.7 0.65 l2 Scan 0.6 SE l2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits) 0.55 0.5 0.45 0.4 0 4 100 # bits 200 300 400 500 600 # anchors (a) Precision Precision within Hamming radius 2 using hash lookup within Hamming radius 2 using hash lookup (fix m=300 to run AGH); (b) Hamming ranking precision of top‐5k ranked neighbors. Hashing with Graphs, ICML 2011 26 MNIST (c) Precision curves (1-AGH vs. 2-AGH) 0 95 0.95 l2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH ((48 bits)) 0.8 Recall 2‐AGH (r) > 1‐AGH (r) 0.7 0.6 0.85 Recall Precision 0.9 (d) Recall curves (1-AGH vs. 2-AGH) 09 0.9 0.8 0.5 0.4 2 AGH ( ) 1 AGH ( /2) 2‐AGH (r) > 1‐AGH (r/2) l2 Scan 0.3 0.75 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) 0.2 07 0.7 50100 200 300 400 500 600 700 800 900 1000 0.1 0 1 0 # samples 0.5 1 # samples 1.5 2 x 10 4 (c) Hamming ranking precision curves (fix m=300); (c) Hamming ranking precision curves (fix m 300); (d) Hamming ranking recall curves (fix m=300). Hashing with Graphs, ICML 2011 27 NUS WIDE NUS‐WIDE (a) Precision vs. # bits 0.7 (b) Precision vs. # anchors LSH PCAH USPLH SH KLSH SIKH 1-AGH 2-AGH much higher g 0.6 0.5 drop 04 0.4 0.3 0.2 0.1 0 12 16 24 32 48 64 Precision of top-50 000 neighbo ors Precision within Ham mming radiu us 2 • About About 270k web images from 81 semantic tags, 2100 query 270k web images from 81 semantic tags 2100 query images sampled from 21 frequent tags. True neighbors are those sharing at least one semantic tag those sharing at least one semantic tag. 0.49 0.485 G S 2 2 > L2 2‐AGH>SE L 0.48 0.475 l2 Scan 0.47 SE l2 Scan (48 dims) 1-AGH (48 bits) 2-AGH (48 bits) 0.465 0 46 0.46 0.455 0.45 100 # bit bits 200 300 400 500 600 # anchors h Hashing with Graphs, ICML 2011 28 NUS WIDE NUS‐WIDE • (a)(c)(d) fix m=300. (b) top (a)(c)(d) fix m=300 (b) top‐5k 5k hamming ranking precision. hamming ranking precision (c) Precision curves (1-AGH vs. 2-AGH) (d) Recall curves (1-AGH vs. 2-AGH) 0.57 0.25 l2 Scan 0.56 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) 02 0.2 0.54 Re ecall Precision 0.55 0.53 0.52 Recall 2‐AGH (r) > 1‐AGH (r) 0.15 0.1 2‐AGH (r) > 1‐AGH (r/2) 0.51 l2 Scan 1-AGH (24 bits) 1-AGH (48 bits) 2-AGH (48 bits) 0.05 05 0.5 0.49 50100 200 300 400 500 600 700 800 900 1000 0 0 # samples 1 2 # samples Hashing with Graphs, ICML 2011 3 4 x 10 4 29 Hamming Ranking Hamming Ranking MAP on MNIST Method r=24 r=48 r=24 0 4125 0.4125 L2 Scan S r=48 0 4523 0.4523 0.5269 0.3909 0.4866 0.4775 BRE (NIPS’10) BRE (NIPS 10) 0 2638 0.2638 0 3090 0.3090 0 4100 0.4100 0 4229 0.4229 SH (NIPS’09) 0.2699 0.2453 0.3609 0.3420 KLSH (ICCV’09) KLSH (ICCV 09) 0.2555 0 3049 0.3049 0.4232 0 4157 0.4157 SIKH (NIPS’10) 0.1947 0.1972 0.3270 0.3094 USPLH (ICML’10) 0.4699 0.4930 0.4269 0.4322 1‐AGH 0.4997 0.3971 0.4762 0.4761 2‐AGH 0.6738 0.6410 0.4699 0.4779 supervised SE L2 Scan linear MP/5k on NUS‐WIDE Hashing with Graphs, ICML 2011 30 Outline • • • • • Overview Graph Hashing Anchor Graph Hashing E Experiments i t Conclusions Hashing with Graphs, ICML 2011 31 Conclusions • Under Under the unsupervised scenario, almost all hashing the unsupervised scenario almost all hashing methods are inferior to L2 linear scan in search accuracy. • O Our AGH method outperforms AGH th d t f L2 scan in terms of i t f searching semantic neighbors. • We have applied Anchor Graphs to achieve scalability in W h li d A h G h t hi l bilit i SSL, clustering, and web image search. • AGH will have wide applications, even only the compact AGH ill h id li i l h code generation scheme. Hashing with Graphs, ICML 2011 32