ppt

advertisement
CMU SCS
Large Graph Mining:
Power Tools and a Practitioner’s guide
Task 3: Recommendations & proximity
Faloutsos, Miller & Tsourakakis
CMU
KDD'09
Faloutsos, Miller, Tsourakakis
P3-1
CMU SCS
Outline
•
•
•
•
•
•
•
•
•
•
•
Introduction – Motivation
Task 1: Node importance
Task 2: Community detection
Task 3: Recommendations
Task 4: Connection sub-graphs
Task 5: Mining graphs over time
Task 6: Virus/influence propagation
Task 7: Spectral graph theory
Task 8: Tera/peta graph mining: hadoop
Observations – patterns of real graphs
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-2
CMU SCS
Acknowledgement:
Most of the foils in ‘Task 3’ are by
Hanghang TONG
www.cs.cmu.edu/~htong
KDD'09
Faloutsos, Miller, Tsourakakis
P3-3
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-4
CMU SCS
Motivation: Link Prediction
A
?
B
i
i
i
i
Should we introduce
Mr. A to Mr. B?
KDD'09
Faloutsos, Miller, Tsourakakis
P3-5
CMU SCS
Motivation - recommendations
customers
Products / movies
‘smith’
??
Terminator 2
KDD'09
Faloutsos, Miller, Tsourakakis
P3-6
CMU SCS
Answer: proximity
• ‘yes’, if ‘A’ and ‘B’ are ‘close’
• ‘yes’, if ‘smith’ and ‘terminator 2’ are
‘close’
QUESTIONS in this part:
- How to measure ‘closeness’/proximity?
- How to do it quickly?
- What else can we do, given proximity
scores?
KDD'09
Faloutsos, Miller, Tsourakakis
P3-7
CMU SCS
How close is ‘A’ to ‘B’?
I
1
J
1
A
1
1
1
H
1
B
1
D
1 1 1
E
G
F
Faloutsos, Miller, Tsourakakis
a.k.a Relevance, Closeness,
‘Similarity’…
KDD'09
P3-8
CMU SCS
Why is it useful?
• Recommendation
And many more
• Image captioning [Pan+]
• Conn. / CenterPiece subgraphs [Faloutsos+], [Tong+],
[Koren+]
and
•
•
•
•
•
•
•
Link prediction [Liben-Nowell+], [Tong+]
Ranking [Haveliwala], [Chakrabarti+]
Email Management [Minkov+]
Neighborhood Formulation [Sun+]
Pattern matching [Tong+]
Collaborative Filtering [Fouss+]
…
KDD'09
Faloutsos, Miller, Tsourakakis
P3-9
CMU SCS
Region
Automatic Image Captioning
Image
Test Image
Keyword
Sea
Sun
Sky
Wave
Cat
Forest
Tiger
Grass
Q: How to assign keywords to the test image?
KDD'09
Faloutsos, Miller,
A:Tsourakakis
Proximity! [Pan+ P3-10
2004]
CMU SCS
Center-Piece Subgraph(CePS)
Input
Output
B
B
CePS guy
C
A
Original Graph
KDD'09
C
A
CePS
Q: How to find hub for the black nodes?
Faloutsos, Miller,
A:Tsourakakis
Proximity! [Tong+ KDD
P3-112006]
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-12
CMU SCS
How close is ‘A’ to ‘B’?
I
Should be close, if they have
- many,
- short
- ‘heavy’ paths
1
J
1
A
1
1
1
H
1
B
1
D
1 1 1
E
G
F
KDD'09
Faloutsos, Miller, Tsourakakis
P3-13
CMU SCS
Some ``bad’’ proximities
Why not shortest path?
A
1
D
1
B
A
1
D
1
B
1 1 1
E
G
F
A: ‘pizza delivery guy’ problem
KDD'09
Faloutsos, Miller, Tsourakakis
P3-14
Some ``bad’’ proximities
CMU SCS
Why not max. netflow?
A
A
1
1
D
D
1
1
E
B
1
B
A: No penalty for long paths
KDD'09
Faloutsos, Miller, Tsourakakis
P3-15
CMU SCS
What is a ``good’’ Proximity?
I
1
J
1
A
1
1
1
H
1
1
D
…
1 1 1
E
G
F
B
• Multiple Connections
• Quality of connection
•Direct & In-directed Conns
•Length, Degree, Weight…
KDD'09
Faloutsos, Miller, Tsourakakis
P3-16
CMU SCS
Random walk with restart
10
9
12
2
8
1
11
3
4
6
5
[Haveliwala’02]
7
KDD'09
Faloutsos, Miller, Tsourakakis
P3-17
CMU SCS
Random walk with restart
0.04
9
0.10
12
2
0.13
1
0.08
3
0.02
8
0.13
11
0.04
4
0.13
6
5
7
0.05
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
0.05
Nearby nodes, higher scores
More red, more relevant
KDD'09
Node 4
0.03
10
Faloutsos, Miller, Tsourakakis
Ranking vector
r4
P3-18
CMU SCS
Why RWR is a good score?
j
1
Q  ( I  cW ) 
Q(i, j )  ri , j
i
W : adjacency matrix.
c: damping factor
Qc W
2
c W
2
c
3
all paths from i
to j with length 1
all paths from i
to j with length 2
KDD'09
Faloutsos, Miller, Tsourakakis
W
3
...
all paths from i
to j with length 3
P3-19
CMU SCS
Detailed outline
• Problem dfn and motivation
• Solution: Random walk with restarts
– variants
•
•
•
•
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-20
CMU SCS
Variant: escape probability
• Define Random Walk (RW) on the graph
• Esc_Prob(CMUParis)
– Prob (starting at CMU, reaches Paris before returning to CMU)
CMU
KDD'09
the remaining
graph
Faloutsos, Miller, Tsourakakis
Paris
Esc_Prob = Pr (smile before cry)
P3-21
CMU SCS
Other Variants
• Other measure by RWs
– Community Time/Hitting Time [Fouss+]
– SimRank [Jeh+]
• Equivalence of Random Walks
– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
– Spring Systems
• Katz [Katz], [Huang+], [Scholkopf+]
• Matrix-Forest-based Alg [Chobotarev+]
KDD'09
Faloutsos, Miller, Tsourakakis
P3-22
CMU SCS
Other Variants
• Other measure by RWs
– Community Time/Hitting Time [Fouss+]
– SimRank [Jeh+]
• Equivalence of Random Walks
All– are
“related
Electric
Networks: to” or “similar to”
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
random walk with restart!
– Spring Systems
• Katz [Katz], [Huang+], [Scholkopf+]
• Matrix-Forest-based Alg [Chobotarev+]
KDD'09
Faloutsos, Miller, Tsourakakis
P3-23
CMU SCS
Map of proximity measurements
RWR
Norma
lize
Katz
Regularized
Un-constrained
Quad Opt.
4 ssp decides 1 esc_prob
Esc_Prob
+ Sink
Hitting Time/
Commute Time
X out-degree
Effective Conductance
relax
Harmonic Func.
Constrained
Quad Opt.
“voltage = position”
String System
KDD'09
Faloutsos,
Miller,Models
Tsourakakis
Physical
P3-24
Mathematic
Tools
CMU SCS
Notice: Asymmetry (even in
undirected graphs)
C
B
C-> A : high
A-> C: low
A
E
KDD'09
D
Faloutsos, Miller, Tsourakakis
P3-25
CMU SCS
Summary of Proximity Definitions
• Goal: Summarize multiple relationships
• Solutions
– Basic: Random Walk with Restarts
• [Haweliwala’02] [Pan+ 2004][Sun+ 2006][Tong+ 2006]
– Properties: Asymmetry
• [Koren+ 2006][Tong+ 2007] [Tong+ 2008]
– Variants: Esc_Prob and many others.
• [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]
KDD'09
Faloutsos, Miller, Tsourakakis
P3-26
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-27
CMU SCS
Preliminary: Sherman–Morrison Lemma
If:
vT
x
A
A
KDD'09
=
=
A
+
A
+
Faloutsos, Miller, Tsourakakis
u
+3
P3-28
CMU SCS
Preliminary: Sherman–Morrison Lemma
If:
vT
x
A
=
A
+
u
Then:
1
1
A u v A
A  (A u v )  A 
T
1
1 v  A  u
1
KDD'09
T 1
1
Faloutsos, Miller, Tsourakakis
T
P3-29
CMU SCS
Sherman – Morrison Lemma –
intuition:
• Given a small perturbation on a matrix
A -> A’
• We can quickly update its inverse
KDD'09
Faloutsos, Miller, Tsourakakis
P3-30
CMU SCS
SM: The block-form
1
 A1  A1B( D  CA1B)1 CA1
A B 
C D   
1
1
1
 ( D  CA B) CA


A
B
C
D
 A1B( D  CA1B)1 

1
1
( D  CA B)

Or…
1
( A  BD 1C ) 1
A B 
C D    1
1
1
  D C ( A  BD C )


 A1B( D  CA1B ) 1 

1
1
( D  CA B)

And many other variants…
Also known as Woodbury Identity
KDD'09
Faloutsos, Miller, Tsourakakis
P3-31
CMU SCS
SM Lemma: Applications
• RLS (Recursive least squares)
– and almost any algorithm in time series!
•
•
•
•
Kalman filtering
Incremental matrix decomposition
…
… and all the fast solutions we will
introduce!
KDD'09
Faloutsos, Miller, Tsourakakis
P3-32
CMU SCS
Reminder: PageRank
• With probability 1-c, fly-out to a random
node
• Then, we have
p = c B p + (1-c)/n 1 =>
p = (1-c)/n [I - c B] -1 1
KDD'09
Faloutsos, Miller, Tsourakakis
P3-33
CMU SCS
p = c B p + (1-c)/n 1
ri  cWri  (1  c ) ei
Ranking vector
KDD'09
Adjacency matrix
Restart p
Faloutsos, Miller, Tsourakakis
The only
difference
Starting vector
P3-34
CMU SCS
p = c B p + (1-c)/n 1
Computing RWR
ri  cWri  (1  c ) ei
Adjacency matrix
Ranking vector
 0.13 
0



 0.10 
1/3
 0.13 
1/3



 0.22 
1/3
 0.13 
0



 0.05 
0
 0.05   0.9   0



 0.08 
0
 0.04 
0



 0.03 
0



 0.04 
0
 0.02 
0



nx1
KDD'09
1/3 1/3 1/3 0 0 0 0
0 1/3 0 0 0 0 1/4
1/3 0 1/3 0 0 0 0
0 1/3 0 1/4 0 0 0
0
0 1/3 0 1/2 1/2 1/4
0
0
0 1/4 0 1/2 0
0
0
0 1/4 1/2 0 0
1/3 0
0 1/4 0 0 0
0
0
0
0 0 0 1/4
0
0
0
0 0 0 0
0
0
0
0 0 0 1/4
0
0
0
0 0 0 0
Restart p
0   0.13 
 0


 
0 0 0 0  0.10 
 0
 0
0 0 0 0  0.13 


 
0 0 0 0  0.22 
1 
 0
0 0 0 0  0.13 


 
0 0 0 0  0.05 
 0
 0.1  


0 0 0 0
0.05
0


 
1/2 0 1/3 0  0.08 
 0


 0
0 1/3 0 0  0.04 
 
 0
1/2 0 1/3 1/2  0.03 


 
0 1/3 0 1/2  0.04 
 0


 0
0 1/3 1/3 0   0.02 
 
Starting vector
0 0 0
nxn
10
9
1
2
1
12
8
3
11
4
5
6
7
nx1
Faloutsos, Miller, Tsourakakis
P3-35
CMU SCS
Q: Given query i, how to solve it?
?
0

1/3
1/3

1/3

0
0
 0.9  
0
0

0

0
0

0

Ranking vector
KDD'09
0 

0 0 0 1/4 0 0 0 0 
0 0 0 0 0 0 0 0 

1/4 0 0 0 0 0 0 0 

0 1/2 1/2 1/4 0 0 0 0 
1/4 0 1/2 0 0 0 0 0 

1/4 1/2 0 0 0 0 0 0 
1/4 0 0 0 1/2 0 1/3 0 
0 0 0 1/4 0 1/3 0 0 

0 0 0 0 1/2 0 1/3 1/2 
0 0 0 1/4 0 1/3 0 1/2 

0 0 0 0 0 1/3 1/3 0 
1/3 1/3 1/3 0 0 0 0
0 1/3 0
1/3 0 1/3
0 1/3 0
0
0 1/3
0
0
0
0
0
0
1/3 0
0
0
0
0
0
0
0
0
0
0
0
0
0
Adjacency matrix
0 0 0
?
 0
 
 0
 0
 
1 
 
 0
 0
 0.1   
 0
 0
 
 0
 
 0
 0
 
 0
 
Query
Ranking vector Starting vector
Faloutsos, Miller, Tsourakakis
P3-36
CMU SCS
OntheFly:
0.13
0.12
0.3 
0.16
0.19
0.14
 0

0.18

0 
0.09
0.13
0.10
1/3
0.16


1/3
0.13
0.12
0.3 
0.19
0.14
 

0.22
0.35
0.1 
0.18
0.26
0.21
1/3

0.13

0.03
0.3 
0.18
0.10
0.15
 0

 0
0.05
0.07
0 
0.04
0.06
   0.9  
0.07
0 
0.04
0.06
 0
0.05
0.07


 0
0 
0.06
0.08
0.07




 0
0
0
0.04

0.01
0.02

 
00 
00.03
 0
0.01
0.04


 0
00 
0.01
0.02

 
 0
 0.01

0.02
0
0
0

 
ri


0 0 0 1/4 0 0 0 0 
0 0 0 0 0 0 0 0 

1/4 0 0 0 0 0 0 0 

0 1/2 1/2 1/4 0 0 0 0 
1/4 0 1/2 0 0 0 0 0 

1/4 1/2 0 0 0 0 0 0 
1/4 0 0 0 1/2 0 1/3 0 
0 0 0 1/4 0 1/3 0 0 

0 0 0 0 1/2 0 1/3 1/2 
0 0 0 1/4 0 1/3 0 1/2 

0 0 0 0 0 1/3 1/3 0 
1/3 1/3 1/3 0 0 0 0
0 1/3 0
1/3 0 1/3
0 1/3
0
0
0
0
0 1/3
0
0
0
0
1/3 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0
0
0 
0.13
 0
 0  
 
 
0.10
 0


 0.13
 0
0 
  
 
1 
0.22
1 
 0  
 
 
 0
0.13
 0
0.05
0 
    0.1   
0 
 0
0.05


 0
 0.08
0 
 
  
 0
0 
0.04
 
  
0 
 0
0.03


 0
 0.04
0 
 
  
 0
 0.02

0
 
  
0.04
0.10
10 0.03
9
10
9
12
12
0.08
0.02
88 11
11
22
1 1
3 30.13
44
5 5 660.05
0.13
77
0.13
0.04
0.05
ri
No pre-computation/ light storage
Slow on-line response
KDD'09
Faloutsos, Miller, Tsourakakis
O(mE)
P3-37
CMU SCS
PreCompute
r10.20r20.13r30.14r40.13r50.68 r60.56 r70.56 r80.63 r90.44 r10
r11 r12
0.35 0.39 0.34
R:


 0.28
 0.14

 0.13

 0.09
 0.03

 0.03
 0.08

 0.03

 0.04
 0.04

 0.02

0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53
0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39
0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38
0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66
0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27
0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27
0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33
0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76
0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54
0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28
0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14


0.45 
0.33 

0.32 

0.56 
0.22 

0.22 
1.13 
0.79 

1.80 
1.72 

2.05 
0.04
0.13
11
99 10
10 0.03
1212
0.08
88
0.02
11
11
0.10
22
3
0.13
0.04
44
5
0.13
66
77
0.05
0.05
cxQ
Q
KDD'09
Faloutsos, Miller, Tsourakakis
P3-38
CMU SCS
PreCompute:
 0.13
2.20
 
 0.10
1.28
 0.13 
 1.43
 0.22
1.29
 
0.91
 0.13
 0.05 
  0.37
 0.05
0.37
 0.08 
  0.84
 0.04

  0.29
 0.03
0.35
 0.04 
  0.39
 0.02 
  0.22
1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34 

2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45 
1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33 

0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32 

0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56 
0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22 

0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22 
1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13 
0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79 

0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80 
0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72 

0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05 
0.04
0.13
11
99 10
10 0.03
1212
0.08
88
0.02
11
11
0.10
22
3
0.13
0.04
44
5
0.13
66
77
0.05
0.05
Fast on-line response
KDD'09
Heavy pre-computation/storage cost
3 )Miller, Tsourakakis
Faloutsos,
O(n
O(n 2 )
P3-39
CMU SCS
Q: How to Balance?
On-line
Off-line
KDD'09
Faloutsos, Miller, Tsourakakis
P3-40
CMU SCS
How to balance?
Idea (‘B-Lin’)
• Break into communities
• Pre-compute all, within a community
• Adjust (with S.M.) for ‘bridge edges’
H. Tong,
C. Faloutsos, &
J.Y. Pan. Fast Random Walk with
KDD'09
Faloutsos, Miller, Tsourakakis
P3-41
Restart and Its Applications. ICDM, 613-622, 2006.
CMU SCS
B_Lin: Basic Idea
[Tong+ ICDM 2006]
9
Find Community
2
1
8
3
10
12
11
4
9
0.04
7
12
2
1
6
5
10
8
3
0.13
1
11
0.10
2
3
10
9
0.13
11
5
5
6
0.13
7
9
1
2
8
3
10
12
6
7
0.05
0.05
Combine
11
4
FixKDD'09
the remaining
5
6
Faloutsos,7Miller, Tsourakakis
0.02
0.04
4
4
12
0.08
8
0.03
P3-42
CMU SCS
B_Lin: details
~
W
~
~
+
~
W 1: within community
KDD'09
Faloutsos, Miller, Tsourakakis
Cross-community
P3-43
CMU SCS
B_Lin: details
-1
-1
~
~
~ I – c W1 – cUSV
I – cW ~
Easy to be inverted LRA difference
SM Lemma!
KDD'09
Faloutsos, Miller, Tsourakakis
P3-44
CMU SCS
B_Lin: summary
• Pre-Computational Stage
• Q: Efficiently compute and store Q
• A: A few small, instead of ONE BIG, matrices inversions
• On-Line Stage
• Q: Efficiently recover one column of Q
• A: A few, instead of MANY, matrix-vector multiplications
KDD'09
Faloutsos, Miller, Tsourakakis
P3-45
CMU SCS
Query Time vs. Pre-Compute Time
Log Query Time
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-computation:
•Two orders saving
Log Pre-compute Time
KDD'09
Faloutsos, Miller, Tsourakakis
P3-46
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-47
CMU SCS
gCaP: Automatic Image Caption
• Q
…
{ Sea Sun Sky Wave}
{ Cat
Forest
Grass
Tiger }
A: Proximity!
[Pan+ KDD2004]
{?, ?, ?,}
KDD'09
Faloutsos, Miller, Tsourakakis
P3-48
CMU SCS
Region
Image
Test Image
Sea
Sun
KDD'09
Sky
Wave
Cat
Forest
Tiger
Grass
Keyword
Faloutsos, Miller, Tsourakakis
P3-49
CMU SCS
Region
Image
Test Image
{Grass, Forest,
Cat, Tiger}
Sea
Sun
KDD'09
Sky
Wave
Cat
Forest
Tiger
Keyword
Faloutsos, Miller, Tsourakakis
Grass
P3-50
CMU SCS
C-DEM
(Screen-shot)
KDD'09
Faloutsos, Miller, Tsourakakis
P3-51
CMU SCS
C-DEM: Multi-Modal Query System for Drosophila
Embryo Databases [Fan+ VLDB 2008]
KDD'09
Faloutsos, Miller, Tsourakakis
P3-52
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-53
CMU SCS
authors
RWR on Bipartite Graph
Author-Conf. Matrix
Observation: n >> m!
n
Examples:
1. DBLP: 400k aus, 3.5k confs
2. NetFlix: 2.7M usrs,18k mvs
Conferences
KDD'09
m
Faloutsos, Miller, Tsourakakis
P3-54
CMU SCS
RWR on Skewed bipartite graphs
• Q: Given query i, how to solve it?
m confs
KDD'09


0 0 0 1/4 0 0 0… 0. 
0 0 0 0 0 0 0 . .0 

..
1/4 0 0 0 0 0 0 0 
… 
0 1/2 1/2 1/4 0 0 0 0 
..
1/4 0 1/2 0 0 0 0 r0 
.. 
1/4 1/2 0 0 0 0 0. …0 
1/4 0 0 0 1/2 0 1/3. . 0 
0 0 0 1/4 0 1/3 0 .. 0 

0 0 0 0 1/2 0 1/3 1/2 
0 0 0 1/4 0 1/3 0 1/2 

0 c
0 0 0 0 1/3 1/3 0 
1/3 1/3 1/3 0 0 0 0
0 1/3 0
1/3 0 1/3
0 1/3
0
0
0 1/3
0
0
0
0
0
0
1/3 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0
0
0
A
….
..
..
…
..
..
.…
..
..
?
 0

1/3
1/3

1/3

 0
 0
 0.9  
 0
 0

 0

 0
 0

 0

0
A
n
?
Faloutsos, Miller, Tsourakakis
m
 0
 
 0
 0
 
1 
 
 0
 0
 0.1   
 0
 0
 
 0
 
 0
 0
 
 0
 
n aus
P3-55
CMU SCS
Idea:
• Pre-compute the smallest, m x m matrix
• Use it to compute the rest proximities, on
the fly
H. Tong,
S. Papadimitriou, P.S.
YuMiller,
& C.Tsourakakis
Faloutsos. Proximity Tracking
on
KDD'09
Faloutsos,
P3-56
Time-Evolving Bipartite Graphs. SDM 2008.
CMU SCS
BB_Lin: Examples
Dataset
Off-Line Cost
On-Line Cost
DBLP
a few minutes
frac. of sec.
NetFlix
1.5 hours
<0.01 sec.
400k authors
x 3.5k conf.s
KDD'09
2.7m user
x 18k movies
Faloutsos, Miller, Tsourakakis
P3-57
CMU SCS
Detailed outline
•
•
•
•
•
•
Problem dfn and motivation
Solution: Random walk with restarts
Efficient computation
Case study: image auto-captioning
Extensions: bi-partite graphs; tracking
Conclusions
KDD'09
Faloutsos, Miller, Tsourakakis
P3-58
CMU SCS
Problem: update
E’ edges changed
Involves n’ authors, m’ confs.
KDD'09
Faloutsos, Miller, Tsourakakis
n authors
m Conferences
P3-59
CMU SCS
Solution:
• Use Sherman-Morrison to quickly update
the inverse matrix
KDD'09
Faloutsos, Miller, Tsourakakis
P3-60
CMU SCS
Fast-Single-Update
log(Time)
(Seconds)
176x speedup
40x speedup
Our method
Our method
KDD'09
Faloutsos, Miller, Tsourakakis
P3-61
Datasets
CMU SCS
pTrack: Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992
1997
2002
2007
Databases
Performance
Distributed Sys.
KDD'09
DBLP: (Au. x Conf.)
- 400k aus,
- 3.5k confs
- 20 yrs
Faloutsos, Miller, Tsourakakis
Databases
Data Mining
P3-62
CMU SCS
pTrack: Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992
1997
2002
2007
Databases
Performance
Distributed Sys.
KDD'09
DBLP: (Au. x Conf.)
- 400k aus,
- 3.5k confs
- 20 yrs
Faloutsos, Miller, Tsourakakis
Databases
Data Mining
P3-63
CMU SCS
KDD’s Rank wrt. VLDB over years
Prox.
Rank
Data Mining and Databases
are getting closer & closer
Year
KDD'09
Faloutsos, Miller, Tsourakakis
P3-64
CMU SCS
cTrack:10 most influential authors in
NIPS community up to each year
T. Sejnowski
M. Jordan
Author-paper bipartite graph from NIPS 1987-1999.
3k. 1740 papers, 2037 authors, spreading over 13 years
KDD'09
Faloutsos, Miller, Tsourakakis
P3-65
CMU SCS
Conclusions - Take-home messages
• Proximity Definitions
– RWR
– and a lot of variants
• Computation
– Sherman–Morrison Lemma
– Fast Incremental Computation
• Applications
– Recommendations; auto-captioning; tracking
– Center-piece Subgraphs (next)
– E-mail management; anomaly detection, …
KDD'09
Faloutsos, Miller, Tsourakakis
P3-66
CMU SCS
References
• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The
PageRank Citation Ranking: Bringing Order to the Web,
Technical report, Stanford Library.
• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In
WWW, 517-526, 2002
• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004)
Automatic multimedia cross-modal correlation discovery.
In KDD, 653-658, 2004.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-67
CMU SCS
References
• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast
discovery of connection subgraphs. In KDD, 118-127, 2004.
• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005)
Neighborhood Formation and Anomaly Detection in
Bipartite Graphs. In ICDM, 418-425, 2005.
• W. Cohen. (2007) Graph Walks and Graphical Models.
Draft.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-68
CMU SCS
References
• P. Doyle & J. Snell. (1984) Random walks and electric
networks, volume 22. Mathematical Association America,
New York.
• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring
and extracting proximity in networks. In KDD, 245–255,
2006.
• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006)
Learning to rank networked entities. In KDD, 14-23, 2006.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-69
CMU SCS
References
• S. Chakrabarti. (2007) Dynamic personalized pagerank in
entity-relation graphs. In WWW, 571-580, 2007.
• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007)
Random-Walk Computation of Similarities between Nodes
of a Graph with Application to Collaborative
Recommendation. IEEE Trans. Knowl. Data Eng. 19(3),
355-369 2007.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-70
CMU SCS
References
• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs:
problem definition and fast solutions. In KDD, 404-413,
2006.
• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random
Walk with Restart and Its Applications. In ICDM, 613622, 2006.
• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast
direction-aware proximity for graph mining. In KDD,
747-756, 2007.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-71
CMU SCS
References
• H. Tong, B. Gallagher, C. Faloutsos, & T. EliassiRad. (2007) Fast best-effort pattern matching in large
attributed graphs. In KDD, 737-746, 2007.
• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos.
(2008) Proximity Tracking on Time-Evolving
Bipartite Graphs. SDM 2008.
KDD'09
Faloutsos, Miller, Tsourakakis
P3-72
CMU SCS
References
• B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos.
Using Ghost Edges for Classification in Sparsely
Labeled Networks. KDD 2008
• H. Tong, Y. Sakurai, T. Eliassi-Rad, and C.
Faloutsos. Fast Mining of Complex Time-Stamped
Events CIKM 08
• H. Tong, H. Qu, and H. Jamjoom. Measuring
Proximity on Graphs with Side Information. ICDM
2008
KDD'09
Faloutsos, Miller, Tsourakakis
P3-73
CMU SCS
Resources
• www.cs.cmu.edu/~htong/soft.htm
For software, papers, and ppt of presentations
• www.cs.cmu.edu/~htong/tut/cikm2008/cikm_tutor
ial.html
For the CIKM’08 tutorial on graphs and proximity
Again, thanks to Hanghang TONG
for permission to use his foils in this
part
KDD'09
Faloutsos, Miller, Tsourakakis
P3-74
Download