Clustering-Part5

advertisement
Clustering methods: Part 5
Fast search methods
Pasi Fränti
5.5.2014
Speech and Image Processing Unit
School of Computing
University of Eastern Finland
Methods considered
Classical speed-up techniques
• Partial distortion search (PDS)
• Mean distance ordered partial search (MPS)
Speed-up of k-means
• Reduced-search based on centroid activity
External search data structures
• Nearest neighbor graph
• Kd-tree
Partial distortion search (PDS)
[Bei and Gray, 1985: IEEE Trans. Communications]
• Current best candidate gives upper limit.
• Distances calculated cumulatively.
• After each addition, check if the partial distortion
exceeds the smallest distance found so far.
• If it exceeds, then terminate the search.
K
ea , j 
 c
k 1
ak
 c jk

2
Mean-distance ordered
partial search (MPS)
[Ra and Kim, 1993: IEEE Trans. Circuits and Systems]
• Calculate distance along projection axis.
• If distance is outside bounding circle defined by
the best candidate, drop the vector.

k
  xi 
 k 1
K
2

 c   K d ( xi , ca )
k 1

K
k
b
Bounds of the MPS method
Bound
Input
vector
A
Bound
B'
C'
A'
B
Best candidate
C
Pseudo code of MPS search
S earchN earestN eig h bo rU sing M P S (c a , c j, d m in )  n n a , d a ;
d m in   ;
up  TRU E;
down  TRU E;
j1  a ;
j2  a ;
This should be updated
according to what was said
during lectures!!
W H IL E (u p O R do w n ) D O
IF up T H E N
j1  j1 + 1 ;
IF j 1 > N T H E N up  F A L S E
E L S E C h eckC an d id ate(s a , s j1 , n a , d m in , n n , up );
IF do w n T H E N
j2  j2 - 1 ;
IF j 2 < 1 T H E N d o w n  F A L S E
E L S E C h eckC an d id ate(s a , s j2 , n a , d m in , n n , do w n );
E N D -W H IL E ;
R E T U R N n n , d m in ;
Activity classification
[Kaukoranta et al., 2000: IEEE Trans. Image Processing]
C o d e v e c to r s:
T r a in in g v e c to r s :
A ctiv e
M o v e d fa r th e r
S ta tic
M o v e d clo s e r
P r e v io u s
N o ch a n g e
Reduced search based on activity
classification
L ä him m än se n troid in
h ak u m ää räy ty y
se u raa va sti:
E tä isy ys lask u jen
m ää rä:
T
O
O
1 00 %
4%
4%
-
-
O
0%
0%
4%
T = tä ysi h aku
O = o sittaine n hak u
O s uu s alk io ista:
Sen troid it:
A lkiot:
A ktiiv ine n, u usi si jain ti
Se ntro idi siirtyn yt läh em mäk si alk iota
A ktiiv ine n, v anh a sija inti
Se ntro idi siirtyn yt ka uem ma ksi alkios ta
St aattinen
Etäisy yde ssä s entroid iin ei m uuto sta
3 ,6 %
3 ,7 %
0 ,1 %
0%
0%
9 2, 6%
Classification due to iterations
Activity of vectors in Random Swap
30 %
30 %
25 %
Amount of full searches
25 %
20 %
20 %
15 %
10%
15 %
5%
10%
0%
0
20
40
60
80
100
120
140
160
180
200
5%
0%
0
2000
4000
6000
Iteration
8000
10000
Effect on distance calculations
K-means
D istance
calculations
/ search
Full
PDS
M PS+PD S
255.97
255.97
26.97
D im ensions
/ distance
calculation
16.00
2.34
8.07
D im ensions /
search
4095.48
598.96
217.60
K-means with activity classification
D istance
calculations
/ search
Full
PDS
M PS+PD S
61.44
61.44
5.35
D im ensions
/ distance
calculation
16.00
2.08
6.72
D im ensions /
search
983.07
127.98
35.97
Effect on processing time
For improving K-means algorithm
F ull
PDS
M PS+PD S
B ridge
W ithout
W ith
grouping
grouping
127.6
46.1
33.4
13.0
12.4
4.8
M iss A m erica
W ithout
W ith
grouping
grouping
1344.5
336.2
311.1
75.8
97.3
21.5
3.8 %
1.6 %
Comparison of speed-up methods
300
F ull
PDS
T IE
MP S
R unning tim e
250
200
150
100
50
0
16
32
64
128
256
C odebook size
512
1024
Improvement of reduced search
R e d u ctio n o f ru n n in g tim e
100 %
80 %
60 %
40 %
F ull
PDS
T IE
MP S
20 %
0 %
16
32
64
128
256
C o d e b o o k s ize
512
1024
Neighborhood graph
Full search:
Graph structure:
Full search:
O(N) distance
calculations.
Graph structure:
O(k) distance
calculations.
Sample graph structure
K-d tree
• See the course: Design of Spatial Information
Systems
Literature
1.
2.
T. Kaukoranta, P. Fränti and O. Nevalainen, "A fast exact GLA
based on code vector activity detection", IEEE Trans. on Image
Processing, 9 (8), 1337-1342, August 2000.
C.-D. Bei and R.M. Gray, "An improvement of the minimum
distortion encoding algorithm for vector quantization", IEEE
Transactions on Communications, 33 (10), 1132-1133, October
1985.
3.
S.-W. Ra and J.-K. Kim, "A Fast Mean-Distance-Ordered Partial
Codebook Search Algorithm for Image Vector Quantization",
IEEE Transactions on Circuits and Systems, 40 (9), 576-579,
Sebtember 1993.
4.
J.Z.C. Lai, Y.-C.Liaw, J.Liu, "Fast k-nearest-neighbor search based
on projection and triangular inequality", Pattern Recognition, 40,
351-359, 2007.
5.
C. Elkan. Using the Triangle Inequality to Accelerate k-Means.
Int. Conf. on Machine Learning, (ICML'03), pp. 147-153.
Literature
6.
James McNames, "A fast nearest neighbor algorithm based on a
principal axis search tree", IEEE Trans. on Pattern Analysis and
Machine Intelligence, 23(9):964-976, September 2001.
7.
J.H. Friedman, J.L. Bentley and R.A. Finkel, "An algorithm for
finding best matches in logarithmic expected time," ACM Trans. on
Mathematical Software, 3 (3), pp. 209-226, September 1977.
8.
R. Sproull, "Refinements to nearest-neighbor searching in K-d tree,"
Algorithmica, 6, pp. 579-589, 1991.
Download