Clustering methods: Part 5 Fast search methods Pasi Fränti 5.5.2014 Speech and Image Processing Unit School of Computing University of Eastern Finland Methods considered Classical speed-up techniques • Partial distortion search (PDS) • Mean distance ordered partial search (MPS) Speed-up of k-means • Reduced-search based on centroid activity External search data structures • Nearest neighbor graph • Kd-tree Partial distortion search (PDS) [Bei and Gray, 1985: IEEE Trans. Communications] • Current best candidate gives upper limit. • Distances calculated cumulatively. • After each addition, check if the partial distortion exceeds the smallest distance found so far. • If it exceeds, then terminate the search. K ea , j c k 1 ak c jk 2 Mean-distance ordered partial search (MPS) [Ra and Kim, 1993: IEEE Trans. Circuits and Systems] • Calculate distance along projection axis. • If distance is outside bounding circle defined by the best candidate, drop the vector. k xi k 1 K 2 c K d ( xi , ca ) k 1 K k b Bounds of the MPS method Bound Input vector A Bound B' C' A' B Best candidate C Pseudo code of MPS search S earchN earestN eig h bo rU sing M P S (c a , c j, d m in ) n n a , d a ; d m in ; up TRU E; down TRU E; j1 a ; j2 a ; This should be updated according to what was said during lectures!! W H IL E (u p O R do w n ) D O IF up T H E N j1 j1 + 1 ; IF j 1 > N T H E N up F A L S E E L S E C h eckC an d id ate(s a , s j1 , n a , d m in , n n , up ); IF do w n T H E N j2 j2 - 1 ; IF j 2 < 1 T H E N d o w n F A L S E E L S E C h eckC an d id ate(s a , s j2 , n a , d m in , n n , do w n ); E N D -W H IL E ; R E T U R N n n , d m in ; Activity classification [Kaukoranta et al., 2000: IEEE Trans. Image Processing] C o d e v e c to r s: T r a in in g v e c to r s : A ctiv e M o v e d fa r th e r S ta tic M o v e d clo s e r P r e v io u s N o ch a n g e Reduced search based on activity classification L ä him m än se n troid in h ak u m ää räy ty y se u raa va sti: E tä isy ys lask u jen m ää rä: T O O 1 00 % 4% 4% - - O 0% 0% 4% T = tä ysi h aku O = o sittaine n hak u O s uu s alk io ista: Sen troid it: A lkiot: A ktiiv ine n, u usi si jain ti Se ntro idi siirtyn yt läh em mäk si alk iota A ktiiv ine n, v anh a sija inti Se ntro idi siirtyn yt ka uem ma ksi alkios ta St aattinen Etäisy yde ssä s entroid iin ei m uuto sta 3 ,6 % 3 ,7 % 0 ,1 % 0% 0% 9 2, 6% Classification due to iterations Activity of vectors in Random Swap 30 % 30 % 25 % Amount of full searches 25 % 20 % 20 % 15 % 10% 15 % 5% 10% 0% 0 20 40 60 80 100 120 140 160 180 200 5% 0% 0 2000 4000 6000 Iteration 8000 10000 Effect on distance calculations K-means D istance calculations / search Full PDS M PS+PD S 255.97 255.97 26.97 D im ensions / distance calculation 16.00 2.34 8.07 D im ensions / search 4095.48 598.96 217.60 K-means with activity classification D istance calculations / search Full PDS M PS+PD S 61.44 61.44 5.35 D im ensions / distance calculation 16.00 2.08 6.72 D im ensions / search 983.07 127.98 35.97 Effect on processing time For improving K-means algorithm F ull PDS M PS+PD S B ridge W ithout W ith grouping grouping 127.6 46.1 33.4 13.0 12.4 4.8 M iss A m erica W ithout W ith grouping grouping 1344.5 336.2 311.1 75.8 97.3 21.5 3.8 % 1.6 % Comparison of speed-up methods 300 F ull PDS T IE MP S R unning tim e 250 200 150 100 50 0 16 32 64 128 256 C odebook size 512 1024 Improvement of reduced search R e d u ctio n o f ru n n in g tim e 100 % 80 % 60 % 40 % F ull PDS T IE MP S 20 % 0 % 16 32 64 128 256 C o d e b o o k s ize 512 1024 Neighborhood graph Full search: Graph structure: Full search: O(N) distance calculations. Graph structure: O(k) distance calculations. Sample graph structure K-d tree • See the course: Design of Spatial Information Systems Literature 1. 2. T. Kaukoranta, P. Fränti and O. Nevalainen, "A fast exact GLA based on code vector activity detection", IEEE Trans. on Image Processing, 9 (8), 1337-1342, August 2000. C.-D. Bei and R.M. Gray, "An improvement of the minimum distortion encoding algorithm for vector quantization", IEEE Transactions on Communications, 33 (10), 1132-1133, October 1985. 3. S.-W. Ra and J.-K. Kim, "A Fast Mean-Distance-Ordered Partial Codebook Search Algorithm for Image Vector Quantization", IEEE Transactions on Circuits and Systems, 40 (9), 576-579, Sebtember 1993. 4. J.Z.C. Lai, Y.-C.Liaw, J.Liu, "Fast k-nearest-neighbor search based on projection and triangular inequality", Pattern Recognition, 40, 351-359, 2007. 5. C. Elkan. Using the Triangle Inequality to Accelerate k-Means. Int. Conf. on Machine Learning, (ICML'03), pp. 147-153. Literature 6. James McNames, "A fast nearest neighbor algorithm based on a principal axis search tree", IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(9):964-976, September 2001. 7. J.H. Friedman, J.L. Bentley and R.A. Finkel, "An algorithm for finding best matches in logarithmic expected time," ACM Trans. on Mathematical Software, 3 (3), pp. 209-226, September 1977. 8. R. Sproull, "Refinements to nearest-neighbor searching in K-d tree," Algorithmica, 6, pp. 579-589, 1991.