faloutsos_tools

advertisement
CMU SCS
Graph Analytics Workshop:
Tools
Christos Faloutsos
CMU
Graph Analytics wkshp
C. Faloutsos (CMU)
1
CMU SCS
Welcome!
Tue
We
Thu
9:00-10:30
Tools
Laplacians
Parallelism
11:00-12:30
NELL
Rich graphs
Communities
1:30-3:00
Exercises
Panel
Scalability
3:30-5:00
Graph. models Posters
Graph ‘Laws’
Reception
Graph Analytics wkshp
C. Faloutsos (CMU)
2
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
3
CMU SCS
Graphs - why should we care?
Food Web
[Martinez ’91]
>$10B revenue
>0.5B users
Internet Map
[lumeta.com]
Graph Analytics wkshp
C. Faloutsos (CMU)
4
CMU SCS
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
T1
D1
...
...
DN
TM
• ‘NELL’: ‘merkel’ ‘chancellor’
‘germany’ - <S><V><O> facts -> tensors
• web: hyper-text graph
• ... and more:
Graph Analytics wkshp
C. Faloutsos (CMU)
5
CMU SCS
Graphs - why should we care?
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic and anomaly
detection
• ....
• Any M:N relationship -> Graph
• Any subject-verb-object construct: -> Graph/Tensor
Graph Analytics wkshp
C. Faloutsos (CMU)
6
CMU SCS
Graphs and matrices
• Closely related
• Powerful tools from matrix algebra, for
graph mining
Graph Analytics wkshp
C. Faloutsos (CMU)
7
CMU SCS
Examples of Matrices:
Graph - social network
Peter
John
John
Peter
Mary
Nick
...
0
5
...
...
...
Graph Analytics wkshp
Mary
11
0
...
...
...
Nick
22
6
...
...
...
C. Faloutsos (CMU)
...
...
...
...
55 ...
7 ...
...
...
...
8
CMU SCS
Examples of Matrices:
Market basket
• market basket as in Association Rules
milk
bread
choc.
wine
...
John
Peter
Mary
Nick
...
Graph Analytics wkshp
C. Faloutsos (CMU)
9
CMU SCS
Examples of Matrices:
Documents and terms
data
Paper#1
Paper#2
Paper#3
Paper#4
...
13
5
...
...
...
Graph Analytics wkshp
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
10
CMU SCS
Examples of Matrices:
Authors and terms
data
John
Peter
Mary ...
Nick ...
...
...
Graph Analytics wkshp
13
5
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
11
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
12
CMU SCS
Node importance - Motivation:
• Given a graph (eg., web pages containing
the desirable query word)
• Q: Which node is the most important?
Graph Analytics wkshp
C. Faloutsos (CMU)
13
CMU SCS
Node importance - Motivation:
• Given a graph (eg., web pages containing
the desirable query word)
• Q: Which node is the most important?
• A1: HITS (SVD = Singular Value
Decomposition)
‘I am important,
• A2: eigenvector (PageRank)
if my friends
are important’ ->
Fixed point /
eigenvector
Graph Analytics wkshp
C. Faloutsos (CMU)
14
CMU SCS
Node importance - motivation
• SVD and eigenvector analysis: very closely
related
Graph Analytics wkshp
C. Faloutsos (CMU)
15
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
16
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case Studies
– HITS
– PageRank
Graph Analytics wkshp
C. Faloutsos (CMU)
17
CMU SCS
SVD - Motivation
• problem #1: text - LSI: find ‘concepts’
• problem #2: compression / dim. reduction
Graph Analytics wkshp
C. Faloutsos (CMU)
18
CMU SCS
SVD - Motivation
• problem #1: text - LSI: find ‘concepts’
Graph Analytics wkshp
C. Faloutsos (CMU)
19
CMU SCS
SVD - Motivation
• Customer-product, for recommendation
system:
vegetarians
meat eaters
Graph Analytics wkshp
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
C. Faloutsos (CMU)
20
CMU SCS
SVD - Motivation
• problem #2: compress / reduce
dimensionality
Graph Analytics wkshp
C. Faloutsos (CMU)
21
CMU SCS
Problem - specs
• Visualize customers
Graph Analytics wkshp
C. Faloutsos (CMU)
22
CMU SCS
SVD - Motivation
Graph Analytics wkshp
C. Faloutsos (CMU)
23
CMU SCS
SVD - Motivation
Graph Analytics wkshp
C. Faloutsos (CMU)
24
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case Studies
– HITS
– PageRank
Graph Analytics wkshp
C. Faloutsos (CMU)
25
CMU SCS
SVD - Definition
• A = U L VT - example:
Graph Analytics wkshp
C. Faloutsos (CMU)
26
CMU SCS
SVD - Definition
A[n x m] = U[n x r] L [ r x r] (V[m x r])T
• A: n x m matrix (eg., n documents, m
terms)
• U: n x r matrix (n documents, r concepts)
• L: r x r diagonal matrix (strength of each
‘concept’) (r : rank of the matrix)
• V: m x r matrix (m terms, r concepts)
Graph Analytics wkshp
C. Faloutsos (CMU)
27
CMU SCS
SVD - Properties
THEOREM [Press+92]: always possible to
decompose matrix A into A = U L VT , where
• U, L, V: unique (*)
• U, V: column orthonormal (ie., columns are unit
vectors, orthogonal to each other)
– UT U = I; VT V = I (I: identity matrix)
• L: singular are positive, and sorted in decreasing
order
Graph Analytics wkshp
C. Faloutsos (CMU)
28
CMU SCS
SVD - Example
• A = U L VT - example:
retrieval
inf.
lung
brain
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
29
CMU SCS
SVD - Example
• A = U L VT - example:
retrieval CS-concept
inf.
lung
MD-concept
brain
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
30
CMU SCS
SVD - Example
• A = U L VT - example:
doc-to-concept
similarity matrix
retrieval CS-concept
inf.
MD-concept
brain lung
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
31
CMU SCS
SVD - Example
• A = U L VT - example:
retrieval
inf.
lung
brain
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
‘strength’ of CS-concept
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
32
CMU SCS
SVD - Example
• A = U L VT - example:
term-to-concept
similarity matrix
retrieval
inf.
lung
brain
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
CS-concept
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
33
CMU SCS
SVD - Example
• A = U L VT - example:
term-to-concept
similarity matrix
retrieval
inf.
lung
brain
data
CS
MD
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
CS-concept
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
34
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case studies
Additional properties
Graph Analytics wkshp
C. Faloutsos (CMU)
35
CMU SCS
SVD - Interpretation #1
‘documents’, ‘terms’ and ‘concepts’:
• U: document-to-concept similarity matrix
• V: term-to-concept sim. matrix
• L: its diagonal elements: ‘strength’ of each
concept
Graph Analytics wkshp
C. Faloutsos (CMU)
36
CMU SCS
SVD – Interpretation #1
‘documents’, ‘terms’ and ‘concepts’:
Q: if A is the document-to-term matrix, what
is AT A?
A:
Q: A AT ?
A:
Graph Analytics wkshp
C. Faloutsos (CMU)
37
CMU SCS
SVD – Interpretation #1
‘documents’, ‘terms’ and ‘concepts’:
Q: if A is the document-to-term matrix, what
is AT A?
A: term-to-term ([m x m]) similarity matrix
Q: A AT ?
A: document-to-document ([n x n]) similarity
matrix
Graph Analytics wkshp
ICDE’09
Copyright:
C. Faloutsos
Faloutsos,
(CMU)
Tong (2009)
2-38
38
CMU SCS
SVD properties
• V are the eigenvectors of the covariance
matrix ATA
• U are the eigenvectors of the Gram (innerproduct) matrix AAT
Further reading:
1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.
Graph Analytics wkshp
Copyright:
C. Faloutsos
Faloutsos,
(CMU)
Tong (2009)
2-39
39
2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.
CMU SCS
SVD - Interpretation #2
• best axis to project on: (‘best’ = min sum of
squares of projection errors)
Graph Analytics wkshp
C. Faloutsos (CMU)
40
CMU SCS
SVD - Motivation
Graph Analytics wkshp
C. Faloutsos (CMU)
41
CMU SCS
SVD - interpretation #2
SVD: gives
best axis to project
first singular
vector
v1
• minimum RMS error
Graph Analytics wkshp
C. Faloutsos (CMU)
42
CMU SCS
SVD - Interpretation #2
Graph Analytics wkshp
C. Faloutsos (CMU)
43
CMU SCS
SVD - Interpretation #2
• A = U L VT - example:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
v1
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
44
CMU SCS
SVD - Interpretation #2
• A = U L VT - example:
variance (‘spread’) on the v1 axis
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
45
CMU SCS
SVD - Interpretation #2
• A = U L VT - example:
– U L gives the coordinates of the points in the
projection axis
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
46
CMU SCS
SVD - Interpretation #2
• More details
• Q: how exactly is dim. reduction done?
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
47
CMU SCS
SVD - Interpretation #2
• More details
• Q: how exactly is dim. reduction done?
• A: set the smallest singular values to zero:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
48
CMU SCS
SVD - Interpretation #2
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
0
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
49
CMU SCS
SVD - Interpretation #2
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
0
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
50
CMU SCS
SVD - Interpretation #2
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
~
0.18
0.36
0.18
0.90
0
0
0
x
C. Faloutsos (CMU)
9.64
x
0.58 0.58 0.58 0
0
51
CMU SCS
SVD - Interpretation #2
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
~
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C. Faloutsos (CMU)
52
CMU SCS
SVD - Interpretation #2
Exactly equivalent:
‘spectral decomposition’ of the matrix:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
C. Faloutsos (CMU)
9.64 0
0
5.29
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
53
CMU SCS
SVD - Interpretation #2
Exactly equivalent:
‘spectral decomposition’ of the matrix:
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
= u1
u2
x
l1
l2
x
v1
v2
Graph Analytics wkshp
C. Faloutsos (CMU)
54
CMU SCS
SVD - Interpretation #2
Exactly equivalent:
‘spectral decomposition’ of the matrix:
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
l1 u1 vT1 +
C. Faloutsos (CMU)
l2 u2 vT2 +...
55
CMU SCS
SVD - Interpretation #2
Exactly equivalent:
‘spectral decomposition’ of the matrix:
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
r terms
=
l1 u1 vT1 +
nx1
l2 u2 vT2 +...
1xm
C. Faloutsos (CMU)
56
CMU SCS
SVD - Interpretation #2
approximation / dim. reduction:
by keeping the first few terms (Q: how many?)
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
l1 u1 vT1 +
l2 u2 vT2 +...
assume: l1 >= l2 >= ...
C. Faloutsos (CMU)
57
CMU SCS
SVD - Interpretation #2
A (heuristic - [Fukunaga]): keep 80-90% of
‘energy’ (= sum of squares of li ’s)
m
n
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Graph Analytics wkshp
0
0
0
0
2
3
1
=
l1 u1 vT1 +
l2 u2 vT2 +...
assume: l1 >= l2 >= ...
C. Faloutsos (CMU)
58
CMU SCS
Pictorially: matrix form of SVD
n
n
m
A

m

VT
U
– Best rank-k approximation in L2
Graph Analytics wkshp
C. Faloutsos (CMU)
59
CMU SCS
Pictorially: Spectral form of SVD
n
m
A
1u1v1

2u2v2
+
– Best rank-k approximation in L2
Graph Analytics wkshp
C. Faloutsos (CMU)
60
CMU SCS
Task 1 - SVD - Detailed outline
• Motivation
• Definition - properties
• Interpretation
– #1: documents/terms/concepts
– #2: dim. reduction
– #3: picking non-zero, rectangular ‘blobs’
• Complexity
• Case studies
• Additional properties
Graph Analytics wkshp
C. Faloutsos (CMU)
61
CMU SCS
SVD - Interpretation #3
• finds non-zero ‘blobs’ in a data matrix
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
Graph Analytics wkshp
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
9.64 0
0
5.29
C. Faloutsos (CMU)
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
62
CMU SCS
SVD - Interpretation #3
• finds non-zero ‘blobs’ in a data matrix
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
Graph Analytics wkshp
=
0.18
0.36
0.18
0.90
0
0
0
0
0
0
0
0.53
0.80
0.27
x
9.64 0
0
5.29
C. Faloutsos (CMU)
x
0.58 0.58 0.58 0
0
0
0
0
0.71 0.71
63
CMU SCS
SVD - Interpretation #3
• finds non-zero ‘blobs’ in a data matrix =
• ‘communities’ (bi-partite cores, here)
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
Graph Analytics wkshp
Row 1
Col 1
Row 4
Col 3
Row 5
Col 4
Row 7
C. Faloutsos (CMU)
64
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case Studies
– HITS
– PageRank
Graph Analytics wkshp
C. Faloutsos (CMU)
65
CMU SCS
SVD - Complexity
• O( n * m * m) or O( n * n * m) (whichever
is less)
• less work, if we just want singular values
•
or if we want first k singular vectors
•
or if the matrix is sparse [Berry]
• Implemented: in any linear algebra package
(LINPACK, matlab, Splus, mathematica ...)
Graph Analytics wkshp
C. Faloutsos (CMU)
66
CMU SCS
SVD - conclusions so far
• SVD: A= U L VT : unique (*)
•
U: document-to-concept similarities
•
V: term-to-concept similarities
•
L: strength of each concept
• dim. reduction: keep the first few strongest
singular values (80-90% of ‘energy’)
– SVD: picks up linear correlations
• SVD: picks up non-zero ‘blobs’
Graph Analytics wkshp
C. Faloutsos (CMU)
67
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case Studies
– HITS
– PageRank
Graph Analytics wkshp
C. Faloutsos (CMU)
68
CMU SCS
Kleinberg’s algo (HITS)
Kleinberg, Jon (1998).
Authoritative sources in a
hyperlinked environment.
Proc. 9th ACM-SIAM
Symposium on Discrete
Algorithms.
Graph Analytics wkshp
C. Faloutsos (CMU)
69
CMU SCS
Recall: problem dfn
• Given a graph (eg., web pages containing
the desirable query word)
• Q: Which node is the most important?
Graph Analytics wkshp
C. Faloutsos (CMU)
70
CMU SCS
Kleinberg’s algorithm
• Problem dfn: given the web and a query
• find the most ‘authoritative’ web pages for
this query
Step 0: find all pages containing the query
terms
Step 1: expand by one move forward and
backward
Graph Analytics wkshp
C. Faloutsos (CMU)
71
CMU SCS
Kleinberg’s algorithm
• Step 1: expand by one move forward and
backward
Graph Analytics wkshp
C. Faloutsos (CMU)
72
CMU SCS
Kleinberg’s algorithm
• on the resulting graph, give high score (=
‘authorities’) to nodes that many important
nodes point to
• give high importance score (‘hubs’) to
nodes that point to good ‘authorities’)
hubs
Graph Analytics wkshp
C. Faloutsos (CMU)
authorities
73
CMU SCS
Kleinberg’s algorithm
observations
• recursive definition!
• each node (say, ‘i’-th node) has both an
authoritativeness score ai and a hubness
score hi
Graph Analytics wkshp
C. Faloutsos (CMU)
74
CMU SCS
Kleinberg’s algorithm
Let E be the set of edges and A be the
adjacency matrix:
the (i,j) is 1 if the edge from i to j exists
Let h and a be [n x 1] vectors with the
‘hubness’ and ‘authoritativiness’ scores.
Then:
Graph Analytics wkshp
C. Faloutsos (CMU)
75
CMU SCS
Kleinberg’s algorithm
Then:
ai = h k + hl + hm
k
i
l
m
Graph Analytics wkshp
that is
ai = Sum (hj) over all j that
(j,i) edge exists
or
a = AT h
C. Faloutsos (CMU)
76
CMU SCS
Kleinberg’s algorithm
i
n
p
q
Graph Analytics wkshp
symmetrically, for the ‘hubness’:
hi = a n + ap + aq
that is
hi = Sum (qj) over all j that
(i,j) edge exists
or
h=Aa
C. Faloutsos (CMU)
77
CMU SCS
Kleinberg’s algorithm
In conclusion, we want vectors h and a such
that:
h=Aa
=
T
a=A h
SVD properties:
A [n x m] v1 [m x 1] = l1 u1 [n x 1]
u1T A = l1 v1T
Graph Analytics wkshp
C. Faloutsos (CMU)
78
CMU SCS
Kleinberg’s algorithm
In short, the solutions to
h=Aa
a = AT h
are the left- and right- singular-vectors of the
adjacency matrix A.
Starting from random a’ and iterating, we’ll
eventually converge
(Q: to which of all the singular-vectors? why?)
Graph Analytics wkshp
C. Faloutsos (CMU)
79
CMU SCS
Kleinberg’s algorithm
(Q: to which of all the singular-vectors?
why?)
A: to the ones of the strongest singular-value:
(AT A ) k v’ ~ (constant) v1
Graph Analytics wkshp
C. Faloutsos (CMU)
80
CMU SCS
Kleinberg’s algorithm - results
Eg., for the query ‘java’:
0.328 www.gamelan.com
0.251 java.sun.com
0.190 www.digitalfocus.com (“the java
developer”)
Graph Analytics wkshp
C. Faloutsos (CMU)
81
CMU SCS
Kleinberg’s algorithm - discussion
• ‘authority’ score can be used to find ‘similar
pages’ (how?)
Graph Analytics wkshp
C. Faloutsos (CMU)
82
CMU SCS
Task 1 - SVD - Detailed outline
•
•
•
•
•
Motivation
Definition - properties
Interpretation
Complexity
Case Studies
– HITS
– PageRank
Graph Analytics wkshp
C. Faloutsos (CMU)
83
CMU SCS
PageRank (google)
•Brin, Sergey and Lawrence
Page (1998). Anatomy of a
Large-Scale Hypertextual
Web Search Engine. 7th Intl
World Wide Web Conf.
Larry
Page
Sergey
Brin
Graph Analytics wkshp
C. Faloutsos (CMU)
84
CMU SCS
Problem: PageRank
Given a directed graph, find its most
interesting/central node
A node is important,
if it is connected
with important nodes
(recursive, but OK!)
Graph Analytics wkshp
C. Faloutsos (CMU)
85
CMU SCS
Problem: PageRank - solution
Given a directed graph, find its most
interesting/central node
Proposed solution: Random walk; spot most
‘popular’ node (-> steady state prob. (ssp))
A node has high ssp,
if it is connected
with high ssp nodes
(recursive, but OK!)
Graph Analytics wkshp
C. Faloutsos (CMU)
86
CMU SCS
(Simplified) PageRank algorithm
• Let A be the adjacency matrix;
• let B be the transition matrix: transpose, column-normalized - then
From
To
2
1
4
5
Graph Analytics wkshp
3
B
1
1
1
1/2
1/2
C. Faloutsos (CMU)
p1
p1
p2
p2
=
1/2
p3
1/2
p4
p4
p5
p5
p3
87
CMU SCS
(Simplified) PageRank algorithm
• Bp=p
B
p
1
2
1
4
5
Graph Analytics wkshp
3
1
1
1/2
1/2
C. Faloutsos (CMU)
=
p
p1
p1
p2
p2
=
1/2
p3
1/2
p4
p4
p5
p5
p3
88
CMU SCS
Definitions
A
D
B
Adjacency matrix (from-to)
Degree matrix = (diag ( d1, d2, …, dn) )
Transition matrix: to-from, column
normalized
B = AT D-1
Graph Analytics wkshp
C. Faloutsos (CMU)
89
CMU SCS
(Simplified) PageRank algorithm
• Bp=1*p
• thus, p is the eigenvector that corresponds
to the highest eigenvalue (=1, since the matrix is
column-normalized)
• Why does such a p exist?
– p exists if B is nxn, nonnegative, irreducible
[Perron–Frobenius theorem]
Graph Analytics wkshp
C. Faloutsos (CMU)
90
CMU SCS
(Simplified) PageRank algorithm
• In short: imagine a particle randomly
moving along the edges
• compute its steady-state probabilities (ssp)
Full version of algo: with occasional random
jumps
Why? To make the matrix irreducible
Graph Analytics wkshp
C. Faloutsos (CMU)
91
CMU SCS
Full Algorithm
• With probability 1-c, fly-out to a random
node
• Then, we have
p = c B p + (1-c)/n 1 =>
p = (1-c)/n [I - c B] -1 1
Graph Analytics wkshp
C. Faloutsos (CMU)
92
CMU SCS
Alternative notation
M
Modified transition matrix
M = c B + (1-c)/n 1 1T
Then
p=Mp
That is: the steady state probabilities =
PageRank scores form the first eigenvector of
the ‘modified transition matrix’
Graph Analytics wkshp
C. Faloutsos (CMU)
93
CMU SCS
Parenthesis: intuition behind
eigenvectors
Graph Analytics wkshp
C. Faloutsos (CMU)
94
CMU SCS
Formal definition
If A is a (n x n) square matrix
(l , x) is an eigenvalue/eigenvector pair
of A if
Ax=lx
CLOSELY related to singular values:
Graph Analytics wkshp
C. Faloutsos (CMU)
95
CMU SCS
Property #1: Eigen- vs singular-values
if
B[n x m] = U[n x r] L [ r x r] (V[m x r])T
then A = (BTB) is symmetric and
C(4): BT B vi = li2 vi
ie, v1 , v2 , ...: eigenvectors of A = (BTB)
Graph Analytics wkshp
C. Faloutsos (CMU)
96
CMU SCS
Property #2
• If A[nxn] is a real, symmetric matrix
• Then it has n real eigenvalues
(if A is not symmetric, some eigenvalues may
be complex)
Graph Analytics wkshp
C. Faloutsos (CMU)
97
CMU SCS
Property #3
• If A[nxn] is a real, symmetric matrix
• Then it has n real eigenvalues
• And they agree with its n singular values,
except possibly for the sign
Graph Analytics wkshp
C. Faloutsos (CMU)
98
CMU SCS
Intuition
• A as vector transformation
x’
A
2
1
2
1 =
x
1
3
1
0
x’
x
1
3
2
1
Graph Analytics wkshp
C. Faloutsos (CMU)
99
CMU SCS
Intuition
• By defn., eigenvectors remain parallel to
themselves (‘fixed points’)
l1
v1
0.52
3.62 * 0.85 =
Graph Analytics wkshp
A
2
1
v1
1
3
0.52
0.85
C. Faloutsos (CMU)
100
CMU SCS
Convergence
• Usually, fast:
Graph Analytics wkshp
C. Faloutsos (CMU)
101
CMU SCS
Convergence
• Usually, fast:
Graph Analytics wkshp
C. Faloutsos (CMU)
102
CMU SCS
Convergence
• Usually, fast:
• depends on ratio
l2
l1 : l2
Graph Analytics wkshp
l1
C. Faloutsos (CMU)
103
CMU SCS
Kleinberg/google - conclusions
SVD helps in graph analysis:
hub/authority scores: strongest left- and rightsingular-vectors of the adjacency matrix
random walk on a graph: steady state
probabilities are given by the strongest
eigenvector of the (modified) transition
matrix
Graph Analytics wkshp
C. Faloutsos (CMU)
104
CMU SCS
Conclusions
• SVD: a valuable tool
• given a document-term matrix, it finds
‘concepts’ (LSI)
• ... and can find fixed-points or steady-state
probabilities (google/ Kleinberg/ Markov
Chains)
Graph Analytics wkshp
C. Faloutsos (CMU)
105
CMU SCS
Conclusions cont’d
(We didn’t discuss/elaborate, but, SVD
• ... can reduce dimensionality (KL)
• ... and can find rules (PCA; RatioRules)
• ... and can solve optimally over- and underconstraint linear systems (least squares /
query feedbacks)
Graph Analytics wkshp
C. Faloutsos (CMU)
106
CMU SCS
References
• Berry, Michael: http://www.cs.utk.edu/~lsi/
• Brin, S. and L. Page (1998). Anatomy of a
Large-Scale Hypertextual Web Search
Engine. 7th Intl World Wide Web Conf.
Graph Analytics wkshp
C. Faloutsos (CMU)
107
CMU SCS
References
• Christos Faloutsos, Searching Multimedia
Databases by Content, Springer, 1996. (App.
D)
• Fukunaga, K. (1990). Introduction to
Statistical Pattern Recognition, Academic
Press.
• I.T. Jolliffe Principal Component Analysis
Springer, 2002 (2nd ed.)
Graph Analytics wkshp
C. Faloutsos (CMU)
108
CMU SCS
References cont’d
• Kleinberg, J. (1998). Authoritative sources
in a hyperlinked environment. Proc. 9th
ACM-SIAM Symposium on Discrete
Algorithms.
• Press, W. H., S. A. Teukolsky, et al. (1992).
Numerical Recipes in C, Cambridge
University Press. www.nr.com
Graph Analytics wkshp
C. Faloutsos (CMU)
109
CMU SCS
PART 2:
Communities
Graph Analytics wkshp
C. Faloutsos (CMU)
110
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
111
CMU SCS
Task 2 – Communities - Detailed
outline
•
•
•
•
Motivation
Hard clustering – k pieces
Hard clustering – optimal # pieces
Observations
Graph Analytics wkshp
C. Faloutsos (CMU)
112
CMU SCS
Problem
• Given a graph, and k
• Break it into k (disjoint) communities
Graph Analytics wkshp
C. Faloutsos (CMU)
113
CMU SCS
Problem
• Given a graph, and k
• Break it into k (disjoint) communities
k=2
Graph Analytics wkshp
C. Faloutsos (CMU)
114
CMU SCS
Solution #1: METIS
• Arguably, the best algorithm
• Open source, at
– http://www.cs.umn.edu/~metis
• and *many* related papers, at same url
• Main idea:
– coarsen the graph;
– partition;
– un-coarsen
Graph Analytics wkshp
C. Faloutsos (CMU)
115
CMU SCS
Solution #1: METIS
• G. Karypis and V. Kumar. METIS 4.0:
Unstructured graph partitioning and sparse
matrix ordering system. TR, Dept. of CS,
Univ. of Minnesota, 1998.
• <and many extensions>
Graph Analytics wkshp
C. Faloutsos (CMU)
116
CMU SCS
Solution #2
(problem: hard clustering, k pieces)
Spectral partitioning:
• Consider the 2nd smallest eigenvector of the
(normalized) Laplacian
See details in ‘Task 7’, later
Graph Analytics wkshp
C. Faloutsos (CMU)
117
CMU SCS
Solutions #3, …
Many more ideas:
• Clustering on the A2 (square of adjacency
matrix) [Zhou, Woodruff, PODS’04]
• Minimum cut / maximum flow [Flake+,
KDD’00]
• …
Graph Analytics wkshp
C. Faloutsos (CMU)
118
CMU SCS
Task 2 – Communities - Detailed
outline
•
•
•
•
Motivation
Hard clustering – k pieces
Hard clustering – optimal # pieces
Observations
Graph Analytics wkshp
C. Faloutsos (CMU)
119
CMU SCS
Cross-association
Desiderata:
 Simultaneously discover row and column groups
 Fully Automatic: No “magic numbers”
 Scalable to large matrices
Reference:
Graph Analytics wkshp
C. Faloutsos (CMU)
1. Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04
120
CMU SCS
versus
Column
groups
Graph Analytics wkshp
Why is this
better?
Row groups
Row groups
What makes a cross-association
“good”?
Column
groups
C. Faloutsos (CMU)
121
CMU SCS
versus
Column
groups
Why is this
better?
Row groups
Row groups
What makes a cross-association
“good”?
Column
groups
simpler; easier to describe
easier to compress!
Graph Analytics wkshp
C. Faloutsos (CMU)
122
CMU SCS
What makes a cross-association
“good”?
Problem definition: given an encoding scheme
• decide on the # of col. and row groups k and l
• and reorder rows and columns,
• to achieve best compression
Graph Analytics wkshp
C. Faloutsos (CMU)
123
CMU SCS
Main Idea
Good
Compression
Total Encoding Cost =
Better
Clustering
Cost of describing
size
*
H(x
)
+
Σi
i
i
cross-associations
Code Cost
Description
Cost
Minimize the total cost (# bits)
for lossless compression
Graph Analytics wkshp
C. Faloutsos (CMU)
124
CMU SCS
Algorithm
l = 5 col groups
k = 5 row groups
k=1,
l=2
k=2,
l=2
Graph Analytics wkshp
k=2,
l=3
k=3,
l=3
C. Faloutsos (CMU)
k=3,
l=4
k=4,
l=4
k=4,
l=5
125
CMU SCS
Experiments
Documents
“CLASSIC”
• 3,893 documents
• 4,303 words
• 176,347 “dots”
Words
Combination of 3 sources:
• MEDLINE (medical)
• CISI (info. retrieval)
• CRANFIELD (aerodynamics)
Graph Analytics wkshp
C. Faloutsos (CMU)
126
CMU SCS
Documents
Experiments
Words
“CLASSIC” graph of documents &
Graph Analytics wkshp
C. Faloutsos (CMU)
words: k=15, l=19
127
CMU SCS
Experiments
insipidus, alveolar, aortic,
death, prognosis, intravenous
blood, disease, clinical, cell,
tissue, patient
MEDLINE
(medical)
“CLASSIC” graph of documents &
Graph Analytics wkshp
C. Faloutsos (CMU)
words: k=15, l=19
128
CMU SCS
Experiments
providing, studying, records,
development, students, rules
abstract, notation, works,
construct, bibliographies
MEDLINE
(medical)
CISI
(Information Retrieval)
“CLASSIC” graph of documents &
Graph Analytics wkshp
C. Faloutsos (CMU)
words: k=15, l=19
129
CMU SCS
Experiments
shape, nasa, leading,
assumed, thin
MEDLINE
(medical)
CISI
(Information Retrieval)
CRANFIELD
(aerodynamics)
“CLASSIC” graph of documents &
Graph Analytics wkshp
C. Faloutsos (CMU)
words: k=15, l=19
130
CMU SCS
Experiments
paint, examination, fall,
raise, leave, based
MEDLINE
(medical)
CISI
(Information Retrieval)
CRANFIELD
(aerodynamics)
“CLASSIC” graph of documents &
Graph Analytics wkshp
C. Faloutsos (CMU)
words: k=15, l=19
131
CMU SCS
Algorithm
Code for cross-associations (matlab):
www.cs.cmu.edu/~deepay/mywww/software/CrossAssociations01-27-2005.tgz
Variations and extensions:
• ‘Autopart’ [Chakrabarti, PKDD’04]
• www.cs.cmu.edu/~deepay
Graph Analytics wkshp
C. Faloutsos (CMU)
132
CMU SCS
Algorithm
• Hadoop implementation [ICDM’08]
Spiros Papadimitriou, Jimeng Sun: DisCo: Distributed Co-clustering with MapReduce:
AAnalytics
Case Study
2008:
Graph
wkshp towards Petabyte-Scale
C. Faloutsos (CMU) End-to-End Mining. ICDM
133
512-521
CMU SCS
Task 2 – Communities - Detailed
outline
•
•
•
•
Motivation
Hard clustering – k pieces
Hard clustering – optimal # pieces
Observations
Graph Analytics wkshp
C. Faloutsos (CMU)
134
CMU SCS
Observation #1
• Skewed degree distributions – there are
nodes with huge degree (>O(10^4), in
facebook/linkedIn popularity contests!)
Graph Analytics wkshp
C. Faloutsos (CMU)
135
CMU SCS
Observation #2
• Maybe there are no good cuts: ``jellyfish’’
shape [Tauro+’01], [Siganos+,’06], strange
behavior of cuts [Chakrabarti+’04],
[Leskovec+,’08]
Graph Analytics wkshp
C. Faloutsos (CMU)
136
CMU SCS
Observation #2
• Maybe there are no good cuts: ``jellyfish’’
shape [Tauro+’01], [Siganos+,’06], strange
behavior of cuts [Chakrabarti+,’04],
[Leskovec+,’08]
?
Graph Analytics wkshp
?
C. Faloutsos (CMU)
137
CMU SCS
Jellyfish model [Tauro+]
…
A Simple Conceptual Model for the Internet Topology, L. Tauro, C. Palmer, G.
Siganos, M. Faloutsos, Global Internet, November 25-29, 2001
Jellyfish: A Conceptual Model for the AS Internet Topology G. Siganos, Sudhir L
Tauro,Graph
M. Faloutsos,
and Networks, Vol. 8, No. 3,138
pp 339Analytics wkshp J. of Communications
C. Faloutsos (CMU)
350, Sept. 2006.
CMU SCS
Strange behavior of min cuts
• ‘negative dimensionality’ (!)
NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti,
Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004
Workshop on Link Analysis, Counter-terrorism and Privacy
Statistical Properties of Community Structure in Large Social and
Information
Networks,
J. Leskovec,
K.(CMU)
Lang, A. Dasgupta, M. Mahoney.
Graph Analytics
wkshp
C. Faloutsos
139
WWW 2008.
CMU SCS
“Min-cut” plot
• Do min-cuts recursively.
log (mincut-size / #edges)
Mincut size
= sqrt(N)
log (# edges)
N nodes
Graph Analytics wkshp
C. Faloutsos (CMU)
140
CMU SCS
“Min-cut” plot
• Do min-cuts recursively.
New min-cut
log (mincut-size / #edges)
log (# edges)
N nodes
Graph Analytics wkshp
C. Faloutsos (CMU)
141
CMU SCS
“Min-cut” plot
• Do min-cuts recursively.
New min-cut
log (mincut-size / #edges)
Slope = -0.5
log (# edges)
N nodes
Graph Analytics wkshp
For a d-dimensional
grid, the slope is -1/d
C. Faloutsos (CMU)
142
CMU SCS
“Min-cut” plot
log (mincut-size / #edges)
log (mincut-size / #edges)
Slope = -1/d
log (# edges)
For a d-dimensional
grid, the slope is -1/d
Graph Analytics wkshp
C. Faloutsos (CMU)
log (# edges)
For a random graph,
the slope is 0
143
CMU SCS
“Min-cut” plot
• What does it look like for a real-world
graph?
log (mincut-size / #edges)
?
log (# edges)
Graph Analytics wkshp
C. Faloutsos (CMU)
144
CMU SCS
Experiments
• Datasets:
– Google Web Graph: 916,428 nodes and
5,105,039 edges
– Lucent Router Graph: Undirected graph of
network routers from
www.isi.edu/scan/mercator/maps.html; 112,969
nodes and 181,639 edges
– User  Website Clickstream Graph: 222,704
nodes and 952,580 edges
NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti,
Y. Zhan,
Blandford,
C. Faloutsos
and
G. Blelloch, in the SDM 145
2004
GraphD.
Analytics
wkshp
C. Faloutsos
(CMU)
Workshop on Link Analysis, Counter-terrorism and Privacy
CMU SCS
Experiments
log (mincut-size / #edges)
• Used the METIS algorithm [Karypis, Kumar,
1995]
Graph Analytics wkshp
Slope~ -0.4
• Google Web graph
• Values along the yaxis are averaged
log (# edges)
• “lip” for large edges
• Slope of -0.4,
corresponds to a 2.5dimensional grid!
C. Faloutsos (CMU)
146
CMU SCS
Experiments
log (mincut-size / #edges)
log (mincut-size / #edges)
• Same results for other graphs too…
Slope~ -0.57
log (# edges)
Lucent Router graph
Graph Analytics wkshp
C. Faloutsos (CMU)
Slope~ -0.45
log (# edges)
Clickstream graph
147
CMU SCS
Task 2 – Communities
Conclusions – Practitioner’s guide
METIS
• Hard clustering – k pieces
• Hard clustering – optimal # pieces Cross-associations
‘jellyfish’:
• Observations
no good cuts
Graph Analytics wkshp
C. Faloutsos (CMU)
148
CMU SCS
PART 3:
Tensors
Graph Analytics wkshp
C. Faloutsos (CMU)
149
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
150
CMU SCS
Task 3 – Tensors - Detailed roadmap
• Motivation
• Definitions: PARAFAC
• Case study: web mining
Graph Analytics wkshp
C. Faloutsos (CMU)
151
CMU SCS
Examples of Matrices:
Authors and terms
data
John
Peter
Mary ...
Nick ...
...
...
Graph Analytics wkshp
13
5
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
152
CMU SCS
But: if it changes over time??
• A: treat it as ‘tensor’
KDD’07
KDD’08
KDD’09
John
Peter
Mary
Nick
...
data
13
5
...
...
...
Graph Analytics wkshp
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
153
CMU SCS
Motivation: Why tensors?
• Q: what is a tensor?
Graph Analytics wkshp
C. Faloutsos (CMU)
154
CMU SCS
Motivation: Why tensors?
• A: N-D generalization of matrix:
KDD’09
John
Peter
Mary
Nick
...
data
13
5
...
...
...
Graph Analytics wkshp
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
155
CMU SCS
Motivation: Why tensors?
• A: N-D generalization of matrix:
KDD’07
KDD’08
KDD’09
John
Peter
Mary
Nick
...
data
13
5
...
...
...
Graph Analytics wkshp
...
...
...
mining
classif.
11
4
22
6
...
...
...
C. Faloutsos (CMU)
tree
...
...
...
...
55 ...
7 ...
...
...
...
156
CMU SCS
Tensors are useful for 3 or more
modes
Terminology: ‘mode’ (or ‘aspect’):
Mode#3
data
13
5
Mode#2
...
...
...
Graph Analytics wkshp
...
...
...
mining
classif.
11
4
22
6
...
...
...
tree
...
...
...
Mode
(==
aspect) #1
C. Faloutsos
(CMU)
...
55 ...
7 ...
...
...
...
157
CMU SCS
Notice
• 3rd mode does not need to be time
• we can have more than 3 modes
Dest. port
125
...
80
13
5
IP source
...
...
...
Graph Analytics wkshp
11
4
...
...
...
22
6
...
...
...
IP destination
C. Faloutsos (CMU)
...
...
...
55 ...
7 ...
...
...
...
158
CMU SCS
Background: Tensors
• Tensors (=multi-dimensional arrays) are
everywhere
– Sensor stream (time, location, type)
– Predicates (subject, verb, object) in knowledge base
“Eric Clapton plays
guitar”
“Barrack Obama is the
president of U.S.”
(48M)
(26M)
NELL (Never Ending
Language Learner) data
Nonzeros =144M
(26M)
Graph Analytics wkshp
C. Faloutsos (CMU)
159
CMU SCS
Task 3 – Tensors - Detailed roadmap
• Motivation
• Definitions: PARAFAC
• Case study: web mining
Graph Analytics wkshp
C. Faloutsos (CMU)
160
CMU SCS
Tensor basics
• Multi-mode extensions of SVD – recall
that:
Graph Analytics wkshp
C. Faloutsos (CMU)
161
CMU SCS
Reminder: SVD
n
n
m
A

m

VT
U
– Best rank-k approximation in L2
Graph Analytics wkshp
C. Faloutsos (CMU)
162
CMU SCS
Reminder: SVD
n
m
A
1u1v1

2u2v2
+
– Best rank-k approximation in L2
Graph Analytics wkshp
C. Faloutsos (CMU)
163
CMU SCS
Extension to (>=)3 modes
Graph Analytics wkshp
C. Faloutsos (CMU)
164
CMU SCS
Main points:
• 2 major types of tensor decompositions:
PARAFAC and Tucker (not examined here)
• both can be solved with ``alternating least
squares’’ (ALS)
Graph Analytics wkshp
C. Faloutsos (CMU)
165
CMU SCS
Task 3 – Tensors - Detailed outline
• Motivation
• Definitions: PARAFAC
• Case study: web mining
Graph Analytics wkshp
C. Faloutsos (CMU)
166
CMU SCS
Discoveries: Problem Definition

Most important concepts and synonyms?
(48M)
(26M)
NELL (Never Ending
Language Learner) data
Nonzeros =144M
(26M)
Graph Analytics wkshp
C. Faloutsos (CMU)
167
CMU SCS
A1: Concept Discovery
• Concept Discovery in Knowledge Base
Graph Analytics wkshp
C. Faloutsos (CMU)
168
CMU SCS
A2.1: Concept Discovery
Graph Analytics wkshp
C. Faloutsos (CMU)
169
CMU SCS
A2: Synonym Discovery
• Synonym Discovery in Knowledge Base
a1a2 … aR
(Given) noun phrase
(Discovered) synonym 1
(Discovered) synonym 2
Graph Analytics wkshp
C. Faloutsos (CMU)
170
CMU SCS
A2: Synonym Discovery
Graph Analytics wkshp
C. Faloutsos (CMU)
171
CMU SCS
GigaTensor: Scaling Tensor Analysis
Up By 100 Times –
Algorithms and Discoveries
U
Kang
Evangelos
Papalexakis
Abhay
Harpale
Christos
Faloutsos
KDD 2012
Graph Analytics wkshp
C. Faloutsos (CMU)
172
CMU SCS
Experiments
• GigaTensor solves 100x larger problem
(K)
(J)
GigaTensor
(I)
100x
Out of
Memory
Graph Analytics wkshp
C. Faloutsos (CMU)
Number of
nonzero
= I / 50
173
CMU SCS
Conclusions
• Real data may have multiple aspects
(modes)
• Tensors provide elegant theory and
algorithms
– PARAFAC (and Tucker): discover groups
• GigaTensor: scales up (hadoop/PEGASUS)
– www.cs.cmu.edu/~pegasus
Graph Analytics wkshp
C. Faloutsos (CMU)
174
CMU SCS
References
• T. G. Kolda, B. W. Bader and J. P. Kenny.
Higher-Order Web Link Analysis Using
Multilinear Algebra. In: ICDM 2005, Pages 242249, November 2005.
• Jimeng Sun, Spiros Papadimitriou, Philip Yu.
Window-based Tensor Analysis on Highdimensional and Multi-aspect Streams, Proc. of
the Int. Conf. on Data Mining (ICDM), Hong
Kong, China, Dec 2006
Graph Analytics wkshp
C. Faloutsos (CMU)
175
CMU SCS
Resources
• See tutorial on tensors, KDD’07 (w/ Tamara
Kolda and Jimeng Sun):
www.cs.cmu.edu/~christos/TALKS/KDD-07-tutorial
Graph Analytics wkshp
C. Faloutsos (CMU)
176
CMU SCS
Tensor tools - resources
• Toolbox: from Tamara Kolda:
csmr.ca.sandia.gov/~tgkolda/TensorToolbox
• T. G. Kolda and B. W. Bader. Tensor Decompositions and
Applications. SIAM Review, Volume 51, Number 3, September 2009
csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf
• T. Kolda
and J. Sun: Scalable
Tensor
Decomposition for Multi-Aspect
Graph Analytics wkshp
ICDE’09
Copyright:
C. Faloutsos
Faloutsos,
(CMU)
Tong (2009)
2-177
177
Data Mining (ICDM 2008)
CMU SCS
PART 4:
Theory
Graph Analytics wkshp
C. Faloutsos (CMU)
178
CMU SCS
Roadmap
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over time – Tensors
Task 4: Theory – intro to Laplacians
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
179
CMU SCS
Task 4 – Theory - Detailed roadmap
• Adjacency matrix
• Laplacian
– Connected Components
– Intuition: 2nd smallest eigenvalue -> ‘good cut’
Graph Analytics wkshp
C. Faloutsos (CMU)
180
CMU SCS
Adjacency matrix
4
1
2
A=
3
Graph Analytics wkshp
C. Faloutsos (CMU)
181
CMU SCS
Adjacency matrix
4
1
2
A=
3
Graph Analytics wkshp
1-step-away
paths
C. Faloutsos (CMU)
182
CMU SCS
Adjacency matrix
4
1
2
Obvious extensions,
for directed and/or
weighted cases
3
Graph Analytics wkshp
C. Faloutsos (CMU)
183
CMU SCS
Task 4 – Theory - Detailed roadmap
• Adjacency matrix
• Laplacian
– Connected Components
– Intuition: 2nd smallest eigenvalue -> ‘good cut’
Graph Analytics wkshp
C. Faloutsos (CMU)
184
CMU SCS
Main upcoming result
the second smallest eigenvector of the
Laplacian (u2)
gives a good cut:
Nodes with positive scores should go to one
group
And the rest to the other
Graph Analytics wkshp
C. Faloutsos (CMU)
185
CMU SCS
Laplacian
4
1
2
L= D-A=
3
Graph Analytics wkshp
Diagonal matrix, dii=di
C. Faloutsos (CMU)
186
CMU SCS
Task 4 – Theory - Detailed roadmap
• Adjacency matrix
• Laplacian
– Connected Components
– Intuition: 2nd smallest eigenvalue -> ‘good cut’
Graph Analytics wkshp
C. Faloutsos (CMU)
187
CMU SCS
Connected Components
• Lemma: Let G be a graph with n vertices
and c connected components. If L is the
Laplacian of G, then rank(L)= n-c.
• Proof: see p.279, Godsil-Royle
Graph Analytics wkshp
C. Faloutsos (CMU)
188
CMU SCS
Connected Components
G(V,E)
1
2
L=
3
4
6
7
5
eig(L)=
Graph Analytics wkshp
C. Faloutsos (CMU)
189
CMU SCS
Connected Components
G(V,E)
1
2
L=
3
4
6
#zeros = #components
7
5
eig(L)=
Graph Analytics wkshp
C. Faloutsos (CMU)
190
CMU SCS
Connected Components
G(V,E)
1
2
3
0.01
4
L=
6
7
5
eig(L)=
Graph Analytics wkshp
C. Faloutsos (CMU)
191
CMU SCS
Connected Components
G(V,E)
1
2
3
0.01
4
L=
6
#zeros = #components
7
5
eig(L)=
Graph Analytics wkshp
C. Faloutsos (CMU)
192
CMU SCS
Connected Components
G(V,E)
1
2
3
0.01
4
L=
6
Indicates a “good cut”
7
5
eig(L)=
Graph Analytics wkshp
C. Faloutsos (CMU)
193
CMU SCS
Task 4 – Theory - Detailed roadmap
• Reminders
• Adjacency matrix
• Laplacian
– Connected Components
– Intuition: 2nd smallest eigenvalue -> ‘good cut’
Graph Analytics wkshp
C. Faloutsos (CMU)
194
CMU SCS
Example: Spectral Partitioning
• K500
dumbbell
graph
• K500
http://en.wikipedia.org/wiki/File:Romeo_and_juliet_brown.jpg
?
Montagues
Romeo
Juliet
Capulets
Graph Analytics wkshp
C. Faloutsos (CMU)
195
CMU SCS
Example: Spectral Partitioning
• This is how adjacency matrix of B looks
spy(B)
Graph Analytics wkshp
C. Faloutsos (CMU)
196
CMU SCS
Example: Spectral Partitioning
• 2nd eigenvector u2 of B:
B u2 = l u2
u2,i score
L = diag(sum(B))-B;
[u v] = eigs(L,2,'SM');
plot(u(:,1),’x’)
Not so much
information yet…
Graph Analytics wkshp
C. Faloutsos (CMU)
Node-id ‘i’
197
CMU SCS
Example: Spectral Partitioning
• 2nd eigenvector after sorting on x2,i score
x2,i score
[ign ind] = sort(u(:,1));
plot(u(ind),'x')
Graph Analytics wkshp
C. Faloutsos (CMU)
Node-id ‘i’
198
CMU SCS
Example: Spectral Partitioning
• 2nd eigenvector after sorting on x2,i score
x2,i score
[ign ind] = sort(u(:,1));
plot(u(ind),'x')
But now we see
the two communities!
Graph Analytics wkshp
C. Faloutsos (CMU)
Node-id ‘i’
199
CMU SCS
Example: Spectral Partitioning
• This is how adjacency matrix of B looks
now
spy(B(ind,ind))
Graph Analytics wkshp
C. Faloutsos (CMU)
200
CMU SCS
Why λ2?
OSCILLATE
Each ball 1 unit of mass
x1
Lx  lx
xn
Dfn of eigenvector
Matrix viewpoint:
Graph Analytics wkshp
C. Faloutsos (CMU)
201
CMU SCS
Why λ2?
OSCILLATE
Each ball 1 unit of mass
Lx  lx
Force due to neighbors
Physics viewpoint:
Graph Analytics wkshp
x1
C. Faloutsos (CMU)
xn
displacement
Hooke’s constant
202
CMU SCS
Why λ2?
OSCILLATE
Each ball 1 unit of mass
x1
Lx  lx
xn
Node id
Eigenvector
value
For the first eigenvector:
All nodes: same displacement (= value)
Graph Analytics wkshp
C. Faloutsos (CMU)
203
CMU SCS
Why λ2?
OSCILLATE
Each ball 1 unit of mass
x1
Lx  lx
xn
Node id
Eigenvector
value
Graph Analytics wkshp
C. Faloutsos (CMU)
204
CMU SCS
Conclusions
Spectrum tells us a lot about the graph:
• Adjacency: #Paths
• Laplacian: Sparse Cut
Graph Analytics wkshp
C. Faloutsos (CMU)
205
CMU SCS
References
• Fan R. K. Chung: Spectral Graph Theory (AMS)
• Chris Godsil and Gordon Royle: Algebraic Graph
Theory (Springer)
• Bojan Mohar and Svatopluk Poljak: Eigenvalues
in Combinatorial Optimization, IMA Preprint
Series #939
• Gilbert Strang: Introduction to Applied
Mathematics (Wellesley-Cambridge Press)
Graph Analytics wkshp
C. Faloutsos (CMU)
206
CMU SCS
PART 5:
Conclusions
Graph Analytics wkshp
C. Faloutsos (CMU)
207
CMU SCS
Summary
•
•
•
•
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over
time
Task 4: Spectral graph theory
Graph Analytics wkshp
C. Faloutsos (CMU)
P9-208
CMU SCS
Summary
•
•
•
•
Task 1: Node importance
Task 2: Community detection
Task 3: Mining graphs over
time
Task 4: Spectral graph theory
Graph Analytics wkshp
->SVD, PageRank, HITS
-> METIS; ‘no good cuts’
-> Tensors
-> Laplacians
C. Faloutsos (CMU)
P9-209
CMU SCS
Acknowledgements
Funding:
IIS-0705359, IIS-0534205, DBI-0640543, CNS-0721736
Graph Analytics wkshp
C. Faloutsos (CMU)
P9-210
THANK YOU!
Christos Faloutsos
www.cs.cmu.edu/~christos
www.cs.cmu.edu/~pegasus
http://www.cs.cmu.edu/~epapalex/gmc/
Graph Analytics wkshp
C. Faloutsos (CMU)
P9-211
Download