Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008 1 Point Pattern Matching P Point Pattern Matching Given two point sets P, Q, find Q’ Q to minimize Dist(P, Q’) = min dist(tP, Q’) where t is a geometric transformation. (e.g., translation, rotation, …) Q 2 Point Pattern Similarity Search Point Pattern Similarity Search … … … … Q A collection of point sets S={P1,P2,…,PN} has been preprocessed. Given a query set Q, find (approximate) nearest Pi with respect to a distance function and transformation group. S = {P1, P2, …, PN} 3 Results Transformation Geometric Hashing Space Space complexity O(Nn) YES Embedding EMD to L1 EMD under Scaling transformation sets Translation O(Nn) NO Brute-force, Heuristic Ours O(Nn log2n ) YES EMDM into Euclidean space O(Nnk+1) None Note YES [Wolfson & Rigoutsos 97] Translation Rotation Affine … Index (k: frame size) [Indyk & Thaper03] [Cohen & Guibas99] Translation EMD: Earth Mover’s Distance SD: Symmetric Difference Distance Embedding SD to L1 4 Problem Definition Point Pattern Similarity Searching: • Distance Measure: Symmetric Difference Distance PΔQ P \ Q Q \ P P = {p1,p2,p3,p4} Q = {p1,p2,p5,p6} PΔQ {p3, p4} {p5, p6} • Error Model: Outliers (but No Noise) • Transformation: Translation • Restriction: Coordinates are integers PΔQ 4 P = {0,12,14,23,35,54,59,64} {0,12,14,23,35,54,59,64} {12,14,17,23,35,54,62,64} { 12,14,23,35,54, 64} t=3 Q = {15,17,20,26,38,57,65,67} … … … … … … … P … Q 5 Motivation: Sources of Complexity • Combination of Translation + Outliers • Translation Only - translate the point set by aligning leftmost point to the origin - trivial matching • Outliers Only - Reduce to Nearest neighbor search in Hamming cube (By hashing or random sampling) 6 Intuition Q P1 P2 P3 P4 PN f f f f f f Metric space 7 Embedding: Basic Definitions Given metric spaces (X, d) and (X', d'), a map f: X X’ is called an embedding. The contraction of f is the maximum factor by which distances d(x, y) are shrunk, i.e., max x, yX d'(f(x), f( Y)) The expansion or stretch of f is the maximum factor by which distances are stretched: d'(f(x), f(Y)) x, yX d(x, y) max The distortion of f is the product of the contraction and expansion. 8 Main Result: Preliminaries • Main result: There exists an randomized embedding that maps a point set under symmetric difference with respect to translation into a metric space L1 with distortion O(log2 n). • Assumption: – Each point set has at most n elements and is in dimension d. – Coordinates are integers of magnitude polynomial in n • Distance Function: Symmetric Difference with respect to translation <PΔQ> = min |(P + t)ΔQ| t • Target Metric: L1 x, y R , d d x y 1 xi yi i1 9 Outline of Algorithm {3,6,10,14,22} 1 0 0 1 0 0 1 0 0 0 1 O(nlogn) {101010, ..., 010100, …, 11101} 3 0 0 2 0 0 1 0 1. Transform d-dimension points into 1-d dimension points. (Distortion: 1) 2. Reduce the domain size using a linear hash function. (Distortion: O(1)) 3. Make invariant under translation. (Distortion: O(log2n)) 4. Reduce the target domain size using a universal hash function. (Distortion: O(1)) 10 Translation Invariant s P= 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 ρ =4 {1101, 0000, 0010, 1100, 0001, … 1010} 11 Intuition hP 1 0 0 1 0 0 1 0 0 0 0 hQ 1 0 0 0 1 0 1 0 0 1 0 s s Φ2P={10,01,00,10,01,00,10,00,00,01,00} one of probes hits mismatched positions, then the bit patterns ΦIf 2Q={10,00,01,00,11,00,10,01,00,11,00} generated may differ. The probability that one of probes hits mismatched positions increases Φ4when P={1101,0000,0010,1100,0000,0001, the probe size increases. 1000,0010,0101,0000,0010} Φ4Q={1011,0100,0010,0101,1000,0011, 1100,0010,0100,1001,0000} 12 Relationship between ρ (probe size) and δ* s δ: estimated distance ρ 2δ δ*: original distance ΦρPΔΦρQ Expectation 2s Unknown Upper bound >2s-2 Distance of Invariants ??? s s/2i 1 δ* / O(ln s) ρ increases δ* 2n δ 13 Embedding δ: estimated distance δ*: original distance ΦρPΔΦρQ δ* O(log s) 2s Distance of Invariants1 ??? δ* .5 20 21 22 … 2L nP H1 log 2n log 2Φ i* i i … ΦiQ H 1 … 2H … 2log 2n=2n log 2Ln L 1 *i * δ* E 2 2 2 1 ΨP Ψ PΨQΨQ 2 δ 2 δ O (log n ) δ 1 1 s O(log s) i 0 0 i 0 iH ii0 δ 14 Build Time The expensive operations are of building invariant and hashing for large domain. Building invariant : (# of Probes) * (# of Translations) Trivial: O(s) * s = O(n log n) * O(n log n) = O(n2 log2 n) Universal hash function: (# of Elements) * (Matrix operation) = (# of Elements) * (Input Size) * (Output Size) Trivial: O(s) * O(s) * O(log s) = O(s2 log s) = O( n2 log3 n ) We can improve it to O( n log3 n ) if we merge two operations. Surprise!!! 15 Merge Two Operations P= 1 0 0 0 1 0 1 0 0 1 0 f r0 rlog s 1 0 1 0 s 1 y0y1 y2 ys-1 1 0 1 0 1 … … Conv((r0 f), P) … H Convolution can be computed in O(n log n) where n is the size of array 16 Main Result: Formal Statement Given failure probability β, there exists a randomized embedding from a point set P into a vector ΨP of dimension O(n (log2n) log(1/β)) such that for any P, Q (i) ΨP ΨQ 2logn PΔQ 1 PΔQ with prob. at least 1 - β (ii) ΨP ΨQ 17 log n This embedding can be computed in time O(n (log4n) log(1/β)) 17 Open Problems • Q1. Can we improve the distortion bound? currently O(log2 n) Cormode & Muthukrishnan show how to embed a string under edit distance with moves into L1 with O(log n log* n) distortion. • Q2. Can we derandomize the algorithm? Cormode & Muthukrishnan’s algorithm is deterministic. • Q3. Can we improve space/time complexities? 18 Other Extensions • Q1. Can we support a distance measure (e.g., Hausdorff distance that is robust to noisy data)? • Q2. Can we handle other transformation groups? - integer scaling? - integer scaling + translation? - affine transformations over finite vector spaces? Point Pattern Similarity Searching: • Distance Measure: Symmetric Difference Distance • Error Model: Outliers (but No Noise) • Transformation: Translation • Restriction: Coordinates are integral 19 Thank You! 20 Translation Invariant P = {3,6,10,14,22} h(x) = x mod s hP = (e.g. s = 11) s 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 ρ =4 {1101, 0000, 0010, 1100, 0001, … 1010} ΦρP = {13,0,2,12,1,…,10} h’(x) : (for simplicity, x mod 10) Φ ρP = 2 0 0 1 0 2 0 1 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 21 Trial 1: Geometric Hashing for Translation • Naïve Version: - Space complexity is O( N n2 ) since the frame size is 1. - With outliers in a query: # of queries will increase • Adaptive Version: To reduce space complexity, if store only c transformed sets, then # of queries will increase. • Outliers may lead a false matching, thus they will increase the prob. of the false positive. 22 Geometric Hashing with Outliers (delete) Based on the outliers $r$ and the frame size $k$, the number of queries will increase to get a correct result. method 1. Pr[ choose a valid frame set] = ( 1 – r/n )^k method 2. (r + 1) different trials ( deterministic) method 3. pigeonhole theorem. Pr[ choose a valid frame set] = 1-r/(n/k) [Grimson&Huttenlocher 90] : Outliers lead a false matching and increase the prob. of the false positive. 23 d-Dimension 1-Dimension Let u be the maximum coordinate value of each point. Then, we can map a d-dimensional point set to a 1-dimensional point set with coordinates of size at most (3u)d. without changing the symmetric difference distance under translation. (5,3) 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 (1,1) 0 1 0 0 1 … 0 0 1 0 0 … 0 1 0 0 0 … 1 [6,15] [21,30] 35 24 # of Primes & Collision Prob. •Collision Probability h(x) = x mod s where s is a prime number in Θ (n log n) ( where s is chosen uniformly at random ) For x != y Pr[h(x) = h(y)] = Pr[(x mod s) = (y mod s)] = Pr[(x-y) mod s = 0] Since x, y Є Znc, |x – y| < nc. Pr[h(x) = h(y)] < c/(# of primes) = 1/O(n) • Prime Number Theorem There exist O(m/log m) prime numbers in range between 1 and m. 25 Distance Distortion by Hashing We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n). Note that the distance is always contracted due to collision. 26 Linear Hash Function (X) • h(x) = x mod s where s is a prime number in Θ(n log n) • Linearity h( x + t ) = h(x) + h(t) 1 0 0 1 0 0 1 0 0 0 1 P = {3,6,10,14,22} - translation ΦρP = Φρ(P+t) 27 S Distance Distortion by Hashing (X) We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n). Note that the distance is always contracted due to collision. 28 Universal Hash Function for large domain Since the maximum probe size is O(n log n), the input domain of hash function is O(2O(n log n)). However, it has only θ(n log n) elements. • H: 2s 2k H(x) = R x + b (mod (2,2,…,2)) R: a random k x s matrix b: k bits random row vector. • Time Complexity: For compute a value : O( k s ) = O( (log n) n log n ) = O( n log2 n ) For, all s (= O(n log n) ) , the time is O( n2 log3 n ). 29 Relationship between ρ and δ* s ρ 2δ ΦρPΔΦρQ δ is a guess distance δ* is an optimal distance Expectation 2s Unknown Upper bound >2s-2 ??? s s/2i 1 δ O(ln s) * δ* 2n δ 30 Effect of Hash Functions s ρ 2δ ΦρPΔΦρQ ??? h’ 2s s 1 δ O(log s) * h δ* 2n δ 31 Merge Two Operations using FFT & Convolution П = random_probe( ρ, s ) For t = 1, …., s, x(t) = (hP + t)[П] // make an invariant For t = 1, …, s. x’(t) = H x(t) + b ( mod (2,2,2,…,2) ) // H: O(log s) x ρ matrix ΦρP[x’(t)]++ Time Complexity: O(s) * O(matrix multi) = O( s ) * O(s log s) -----------------------------------------------------------------------H = [r1, r2, …, rO(log s)]’ // ri : a binary row bit vector Hx(t) = [ r1 x(t), r2 x(t), r3 x(t), …, rO(logs) x(t)]’ ri x(t) = ri (hP + t)[П] = (hP + t)[П ri] [ri x(0), ri x(1), …, ri x(s)] = fliplr(hP) [П ri] Time Complexity: O(log s) * O(convolution) = O( log s ) * O(s log s) 32 Build Time Trivial running time Ours d-dimension -> 1-dimension O(dn) O(dn) Linear Hashing O(n) O(n) Invariant under Translation O(n^2 log^2 n) Universal Hashing (due to the domain size, we need to use matrix multiplication ) O(n^2 log^4 n) O( n log^3 n) 33