An Investigation Into New Approaches for Clustering Matrices

advertisement
The Fiedler Vector and Graph
Partitioning
Barbara Ball
baljmb@aol.com
Clare Rodgers
clarerodgers@hotmail.com
College of Charleston
Graduate Math Department
Research Under
Dr. Amy Langville
Outline
General Field of Data Clustering
–
–
–
–
Motivation
Importance
Previous Work
Laplacian Method



Fiedler Vector
Limitations
Handling the Limitations
Outline
Our Contributions
–
Experiments


–
–
Sorting eigenvectors
Testing Non-symmetric Matrices
Hypotheses
Implications
Future Work
–
–
Non-square matrices
Proofs
References
Understanding Graph Theory
2
3
4
1
6
5
10
7
8
9
Given this graph, there are no apparent clusters.
Understanding Graph Theory
7
1
3
2
9
4
6
5
10
8
Although the clusters
are now apparent, we
need a better method.
Finding the Laplacian Matrix
• A = adjacency matrix
Rows sum to zero
• D = degree matrix
 2 0 0 0 0 0 0 1
 0 3 1 0 0 0 1 0

 0 1 4 0 0 0 1 0

 0 0 0 3 1 1 0 0
 0 0 0 1 3 1 0 0
L
 0 0 0 1 1 3 0 0
 0 1 1 0 0 0 2 0

 1 0 0 0 0 0 0 2
 1  1  1 0 0 0 0  1

 0 0  1  1  1  1 0 0
• Find the Laplacian matrix, L
• L= D -A
3
2
4
1
6
5
7
8
9
10
1
1
1
0
0
0
0
1
4
0
0
0 
 1

 1
 1

 1
0

0
0

4 
Behind the Scenes of the
Laplacian Matrix

Rayleigh Quotient Theorem:
–
–
seeks to minimize the off-diagonal elements of the matrix
or minimize the cutset of the edges between the clusters
*
*
*
*
*
*
*
*
*
*
*
*
*
Not easily
clustered
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Clusters
apparent
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Behind the Scenes of the
Laplacian Matrix

Rayleigh Quotient Theorem Solution:
–
–

1=0, the smallest right-hand eigenvalue of the
symmetric matrix, L
1 corresponds to the trivial eigenvector
v1= e = [1, 1, …, 1].
Courant-Fischer Theorem:
–
also based on a symmetric matrix, L, searches for
the eigenvector, v2, that is furthest away from e.
Using the Laplacian Matrix

v2, gives relation information about the nodes.

This relation is usually decided by separating
the values across zero.

A theoretical justification is given by
Miroslav Fiedler. Hence, v2 is called
the Fiedler vector.
Using the Fiedler Vector

v2 is used to recursively partition the graph by
separating the components into negative and
positive values.
Entire Graph: sign(V2)=[-, -, -, +, +, +, -, -, -, +]
Reds: sign(V2)=[-, +, +, +, -, -]
1
2
3
7
4
8 9
1
8
7
2
9
6
3
5
10
Problems With Laplacian Method

The Laplacian method requires the use of:
–
–
–
an undirected graph
a structurally symmetric matrix
square matrices

Zero may not always be the best choice for partitioning
the eigenvector values of v2 (Gleich)

Recursive algorithms are expensive
Current Clustering Method

Monika Henzinger, Director of Google
Research in 2003, cited generalizing directed
graphs as one of the top six algorithmic
challenges in web search engines.
How Are These Problems Currently
Being Solved?

Forcing symmetry for non-square matrices:
– Suppose A is an (ad x term) non-square matrix.
– B imposes symmetry on the information:
0
B T
A

A
0 
Example:
0
A
1

1
0
0

1

0 1 1
A 

0
0
1


T
0
0

B  0

0
0
0
0
0
1
0
0
0
0
1
1
0
1
1
0
0
0
0
1

0
0
How Are These Problems Currently
Being Solved?

Forcing symmetry in square matrices:
– Suppose C represents a directed graph.
– D imposes bidirectional information by finding the nearest
symmetric matrix:
D = C + CT

Example:
0 0 1 
C  1 0 0
1 1 0
0 1 1 
C T  0 0 1
1 0 0
0 1 2
D  1 0 1
2 1 0
How Are These Problems Currently
Being Solved?

Graphically Adding Data:
1
1
2
3
2
3
How Are These Problems Currently
Being Solved?

Graphically Deleting Data:
1
1
2
3
2
3
Our Wish:

Use Markov Chains and the subdominant righthand eigenvector ( Ball-Rodgers vector) to
cluster asymmetric matrices or directed
graphs.
Where Did We Get the Idea?
Stewart, in An Introduction to Numerical Solutions of Markov Chains,
suggests the subdominant, right-hand eigenvector (Ball-Rodgers vector)
may indicate clustering.
The Markov Method
0
0

0

0
0
A
0
0

1
1

0
0
0
1
0
0
0
1
0
1
0
0
1
0
0
0
0
1
0
1
1
0
0
0
0
1
1
0
0
0
1
0
0
0
1
0
1
0
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
1
1
0
0
0
0
1
0
0
0
0
1

1
1

1
0

0
0

0
0
0

0

0
0
P
0
0

 .5
.25

 0
0
0
0
0
0
0
0 .3333 0
0
0 .3333
.25 0
0
0
0
.25
0
0
0 .3333 .3333 0
0
0 .3333 0 .3333 0
0
0 .3333 .3333 0
0
.5 .5
0
0
0
0
0
0
0
0
0
0
.25 .25
0
0
0
0
0 .25 .25 .25 .25
0
.5 .5
0 .3333
0 .25
0
0
0
0
0
0
0
0
0 .5
.25 0
0
0
0 
0 
.25 

.3333
.3333

.3333
0 

0 
0 

0 
Different Matrices and Eigenvectors
Different Matrices:

A: connectivity matrix
Respective Eigenvectors:



L = D – A : Laplacian matrix
–

P: probability Markov matrix
–

rows sum to 0
–

2nd smallest of L
(Fiedler vector)

2nd largest of P
(Ball-Rodgers vector)
2nd smallest of Q
the rows sum to 1
Q = I – P: transitional rate
matrix
rows sum to 0
2nd largest of A
2nd smallest of A

Graph 1
Eigenvector
Value Plots
4
2
1
6
7
3
9
8
5
10
Second Largest of A
Fiedler Vector
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.2
-0.1
-0.3
-0.2
-0.4
-0.3
-0.5
9
2
3
8
1
7
10
5
4
-0.4
6
8
1
Ball-Rodgers Vector
2
7
3
10
5
4
6
Second Smallest of Q
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
9
4
5
6
10
3
7
2
9
1
8
-0.4
1
8
9
2
7
3
10
4
5
6
4
2
1
6
7
3
9
8
5
10
Graph 1
Banding Using
Eigenvector Values
Banded A
Banded L
Second Largest of A
Fiedler Vector
9
8
2
1
3
9
8
2
1
7
7
3
10
5
4
6
9
2
3
8
1
7 10
nz = 30
5
4
6
Banded
P of P
Second
Largest
8
1
9
2
7
Reorders
just by
using the
indices of
the sorted
eigenvector
– No
Recursion
10
5
4
6
8
1
9
2
7
3 10
nz = 30
5
4
6
Banded Q
Second Smallest of Q
1
8
9
2
7
3
3
10
10
5
4
4
5
6
6
8
1
9
2
7
3 10
nz = 30
5
4
6
1
8
9
2
7
3 10
nz = 30
4
5
6
Graph 1
Reordering Using
Laplacian Method
4
2
1
6
7
3
9
8
5
10
Reordered
Reordered LL
A
A
0
4
1
2
5
3
6
4
10
5
1
6
8
7
8
9
9
2
10
3
11
0
2
4
6
8
10
7
4
5
6 10 1 8 9
nz = 30
2
3
7
Graph 1
Reordering Using
Markov Method
4
2
1
6
7
3
9
8
5
10
A
A
Reordered
Reordered PP
0
1
2
2
3
3
4
7
5
1
6
8
Reordered
Q
Reordered Q
7
9
8
4
4
9
5
5
10
11
6
0
2
4
6
8
10
6
10
10
1
2
2
3
7
8
9
4
5
6
10
1
2
3
7
8
9
3
7
1
8
9
4
5
6
10
4
16
1
6
12
13
11
5
3
20
8
7
17
14
18
15
22
9
19
21
2
Graph 2
Eigenvector
Value Plots
10
23
0.1
Second Largest of A
Fiedler Vector
0.3
0.05
0.2
0
0.1
-0.05
-0.1
0
-0.15
-0.1
-0.2
-0.25
-0.2
-0.3
-0.3
-0.35
-0.4
1022 2021 1518 2 19122314 3 16 5 131711 9 6 4 8 7 1
Ball-Rodgers Vector
0.3
-0.4
Second Smallest of Q
0.4
0.2
0.3
0.1
0.2
0
0.1
-0.1
0
-0.2
-0.1
-0.3
-0.2
-0.4
9 4 7 1 8 6 1117 16 3 20 2 192115121022141823 5 13
9 4 7 1 8 6 1117 16 3 20 2 19211522 1023 181214 5 13
-0.3
5 13231814102212 1521 2 1920 3 161711 6 8 1 7 4 9
4
16
1
6
12
13
11
3
20
8
7
17
Graph 2
Banding Using
Eigenvector Values
14
18
15
22
9
5
19
21
2
10
23
Banded A
Banded L
Fiedler Vector
Second Largest of A
10
22
20
21
15
18
2
19
12
23
14
3
16
5
13
17
11
9
6
4
8
7
1
Nicely
banded, but
no apparent
blocks.
9
4
7
1
8
6
11
17
16
3
20
2
19
21
15
22
10
23
18
12
14
5
13
9 4 7 1 8 61117163202192115221023181214513
nz = 71
10222021151821912231431651317119 6 4 8 7
1
nz = 71
Banded P
Banded Q
Second Largest of P
9
4
7
1
8
6
11
17
16
3
20
2
19
21
15
12
10
22
14
18
23
5
13
Second Smallest of Q
5
13
23
18
14
10
22
12
15
21
2
19
20
3
16
17
11
6
8
1
7
4
9
9 4 7 1 8 61117163202192115121022141823513
nz = 71
51323181410221215212192031617116 8 1 7 4 9
nz = 71
4
1
16
6
12
13
11
5
3
20
8
7
17
14
18
15
22
9
19
2
21
Graph 2
Reordering Using
Laplacian Method
10
23
ReorderedL L
Reordered
2
A
A
15
0
19
5
12
13
5
14
10
18
20
10
21
22
23
3
15
11
17
1
4
20
6
7
8
9
0
5
10
15
nz = 71
20
16
2 15 19 5 12 13 14 10 18 20 21 22 23 3 11 17 1
4
6
7
8
9 16
4
1
16
6
12
13
11
5
3
20
8
7
17
14
18
15
22
9
19
2
21
Graph 2
Reordering Using
Markov Method
10
23
Reordered
Reordered P P
AA
0
5
Reordered
Reordered Q Q
10
15
20
0
5
10
15
nz = 71
20
1
3
4
6
7
8
9
11
16
17
2
5
10
12
13
14
15
18
19
20
21
22
23
10
18
21
22
23
2
15
19
20
5
12
13
14
1
4
7
8
9
3
6
11
16
17
1018212223 2151920 5121314 1 4 7 8 9 3 6111617
nz = 71
1 3 4 6 7 8 91116172 51012131415181920212223
nz = 93
Directed Graph 1
7
2
1
4
9
3
8
10
5
Although it is directed, the
Fiedler vector still works.
6
Directed Graph 1
7
2
1
4
9
3
8
10
5
6
v2 = -0.5783 -0.2312 -0.0388 0.1140 0.1255 0.1099 -0.1513 -0.5783 -0.4536 0.0821
7
Directed Graph 1
Reordering Using
Laplacian Method
2
3
1
9
4
10
8
5
6
Reordered
L
Reordered L
1
A
A
0
2
1
8
2
9
3
3
4
7
5
4
6
5
7
6
8
10
9
1
10
11
0
1
2
3
4
5
6
7
8
9
10
11
2
8
9
3
7
4
5
6
10
7
Directed Graph 1
Reordering Using
Markov Method
2
3
1
9
4
10
8
5
6
ReorderedPP
Reordered
1
A
A
0
8
1
9
2
2
3
3
4
7
5
4
6
7
5
8
6
9
10
10
1
11
0
1
2
3
4
5
6
7
8
9
10
11
8
9
2
3
7
4
5
6
10
Directed Graph 1-B
7
2
1
4
3
9
8
Was bi-directional
10
5
6
Directed Graph 1-B



The Laplacian Method no longer works on this
graph.
Certain edges must be bi-directional in order to
make the matrix irreducible.
7
Currently, to deal with
2
3
this problem, a small
1
number is added to each
9
4
10
element of the matrix.
.01
8
5
6
7
2
3
4
1
10
9
5
6
8
Directed Graph 1-B
Reordering Using
Markov Method
ReorderedP P
Reordered
AA
1
1
8
2
9
3
2
4
5
3
6
7
7
4
8
5
9
6
10
10
1
2
3
4
5
6
7
8
9
10
1
8
9
2
3
7
4
5
6
10
Answer
Answer
0
Directed Graph 2
Reordering Using
Markov Method
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
47
0
2
4
6
8
Reordered
Reordered P
P
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 4647
A
A
14
15
37
4
10
17
6
25
3
5
9
27
26
44
43
13
40
20
22
1
38
45
12
30
7
8
36
35
31
39
2
29
11
28
19
16
32
33
41
42
18
34
24
46
21
23
141537 4 1017 6 25 3 5 9 2726444313402022 138451230 7 836353139 2291128191632334142183424462123
14
15
22
39
40
44
4
5
7
9
10
11
20
25
26
31
1
2
3
6
8
12
13
16
17
18
19
21
23
24
27
28
29
30
32
33
34
35
36
37
38
41
42
43
45
46
141522394044 4 5 7 9 1011202526311 2 3 6 8 12131617181921232427282930323334353637384142434546
Answer
Answer
0
Directed Graph 3
Reordering Using
Markov Method
50
100
150
200
250
300
Reordered
Reordered P
261
74
350
37
0
50
100
150
200
250
300
350
176
AA
24
261
165
368
74
59
117
37
229
144
176
338
140
24
327
381
165
219
22
261
368
59
117
261
74
37
176
24
165
368
59
117
74
37
176
24
165
368
59
117
229
144
338
140
327
381
21922
Answer
Answer
0
Directed Graph 4
Reordering Using
Markov Method
20
40
60
80
100
10% antiblock
elements
120
140
Reordered
Reordered PP
1
10
12
23
0
20
40
60
80
100
120
140
24
AA
139
26
57
27
130
31
39
32
75
33
49
41
35
43
3
48
24
63
137
66
67
1
44
109
46
20
77
98
139
57
130
39
75
49
35
3
24
137
44
109
46
20
7798
10
12
23
24
26
27
31
32
33
41
43
48
63
6667
Answer
Answer
0
Directed Graph 5
Reordering Using
Markovian Method
20
40
60
80
100
30% antiblock
elements
120
140
0
20
40
60
80
100
120
140
AA
62
115
1
23
Only the first
partition is
shown.
30
36
94
28
16
127
53
19
119
108
140
55
62
115
1
23
30
36
94
28
16
127
53
19
119
108
140
55
Reordered P
Hypothesis
Implications
Plotting the eigenvector
values gives better estimates
of the number of clusters
A number other than zero
may be used to partition the
eigenvector values.
Sometimes, sorting the
eigenvector values clusters
the matrix without any type of
recursive process.
Recursive methods are timeconsuming. The eigenvector
plot takes virtually no time at
all and requires very little
programming or storage!
Using the stochastic matrix P
cluster asymmetric matrices
or directed graphs
Non-symmetric matrices (or
directed graphs) can be
clustered without altering
data!
Future Work

Experiments on Large Non-Symmetric Matrices

Non-square matrices

Clustering eigenvector values to avoid recursive
programming

Proofs
Questions
References









Friedberg, S., Insel, A., and Spence, L. Linear Algebra: Fourth Edition.
Prentice-Hall. Upper Saddle River, New Jersey, 2003.
Gleich, David. Spectral Graph Partitioning and the Laplacian with Matlab. January 16, 2006.
http://www.stanford.edu/~dgleich/demos/matlab/spectral/spectral.html
Godsil, Chris and Royle, Gordon. Algebraic Graph Theory.
Springer-Verlag New York, Inc. New York. 2001.
Karypis, George. http://glaros.dtc.umn.edu/gkhome/node
Langville, Amy. The Linear Algebra Behind Search Engines. The Mathematical Association of
America – Online. http://www.joma.org. December, 2005.
Mark S. Aldenderfer, Mark S. and Roger K. Blashfield. Cluster Analysis .
Sage University Paper Series: Quantitative Applications in the Social Sciences,1984.
Moler, Cleve B. Numerical Computing with MATLAB. The Society for Industrial
and Applied Mathematics. Philadelphia, 2004.
Roiger, Richard J. and Michael W. Geatz. Data Mining: A Tutorial-Based Primer
Addison-Wesley, 2003.
Vanscellaro, Jessica E. “The Next Big Thing In Searching”
Wall Street Journal. January 24, 2006.
References












Zhukov, Leonid. Technical Report: Spectral Clustering of Large Advertiser Datasets Part I.
April 10, 2003.
Learning MATLAB 7. 2005. www.mathworks.com
www.Mathworld.com
www.en.wikipedia.org/
http://www.resample.com/xlminer/help/HClst/HClst_intro.htm
http://comp9.psych.cornell.edu/Darlington/factor.htm
www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Markov.html
http://leto.cs.uiuc.edu/~spiros/publications/ACMSRC.pdf
http://www.lifl.fr/~iri-bn/talks/SIG/higham.pdf
http://www.epcc.ed.ac.uk/computing/training/document_archive/meshdecompslides/MeshDecomp-70.html
http://www.cs.berkeley.edu/~demmel/cs267/lecture20.html
http://www.maths.strath.ac.uk/~aas96106/rep02_2004.pdf
Eigenvector Example
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
back
Structurally Symmetric
0 .5 .5


A  1 0 0 
1 0 0 
back
0 * * 


A  * 0 0 
* 0 0
Theory Behind the Laplacian

Minimize the edges between the clusters
Theory Behind the Laplacian

Minimizing edges between clusters is the same as minimizing offdiagonal elements in the Laplacian matrix.

min pTLp
where pi = {-1, 1} and i is the each node.

p represents the separation of the nodes into positives and negatives.

pTLp = pT(D-A)p = pTDp – pTAp

However, pTDp is the sum across the diagonal, so is is a constant.

Constants do not change the outcome of optimization problems.
Theory Behind the Laplacian

min pTAp

This is an integer nonlinear program.

This can be changed to a continuous program by using Lagrange
relaxation and allowing p to take any value from –1 to 1. We
rename this vector x, and let its magnitude be N. So, xTx=N.

min xTAx - (xTx – N)

This can be rewritten as the Rayleigh Quotient:
min xTAx/xTx = 1
Theory Behind the Laplacian

1=0 and corresponds to the trivial eigenvector v1=e

The Courant-Fischer Theorem seeks to find the next best solution
by adding an extra constraint of x  e.

This is found to be the subdominant eigenvector v2, known as the
Fiedler vector.
Theory Behind the Laplacian

Our Questions:
–
The symmetry requirement is needed for the matrix
diagonalization of D. Why is D important since it is irrelevant
for a minimization problem?
–
If diagonalization is important, could SVD be used instead?
future
Download