Slides - Hanghang Tong

advertisement
SCS CMU
Proximity Tracking on TimeEvolving Bipartite Graphs
Speaker: Hanghang Tong
Joint Work with
Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos
Apr. 24-26, 2008, Atlanta
SIAM Conference on Data Mining
SCS CMU
Graphs are everywhere!
2
SCS CMU
Graph Mining: the big picture
Graph/Global Level
Subgraph/
Community Level
Node Level
We are here!
3
SCS CMU
Proximity on Graph: What?
I
1
J
1
A
1
1
1
H
1
B
1
D
1 1 1
E
G
F
a.k.a Relevance, Closeness, ‘Similarity’…
4
density
SCS CMU
0.18
Link Prediction
0.16
0.14
0.12
0.1
Prox. Hist. for a set
of deleted links
0.08
0.06
0.04
Prox (ij)+Prox (ji)
0.02
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
densityis effective to ‘deleted’ and absent edges!
Prox.
0.25
0.2
Prox. Hist. for a set
of absent links
0.15
0.1
Prox (ij)+Prox (ji)
0.05
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Q: How to predict the existence of the link?
A: Proximity! [Liben-Nowell + 2003]
5
…
…
SCS CMU
IJCAI
Neighborhood
Search on graphs
Philip S. Yu
KDD
ICDM
Ning Zhong
SDM
R. Ramakrishnan
AAAI
M. Jordan
…
NIPS
…
Conference
Author
Q: what is most related conference to ICDM?
A: Proximity! [Sun+ ICDM2005]6
SCS CMU
Example
PKDD
SDM
PAKDD
0.008
0.009
0.007
0.005
KDD
ICML
0.011
ICDM
CIKM
0.005
0.004
0.004
ICDE
0.005
0.004
ECML
SIGMOD
DMKD
7
SCS CMU
Region
Automatic Image Caption
Image
Test Image
Keyword
Sea
Sun
Sky
Wave
Cat
Forest
Tiger
Grass
Q: How to assign keywords to the test image?
8
A: Proximity! [Pan+ 2004]
SCS CMU
Center-Piece Subgraph(CePS)
Input
Output
B
B
CePS guy
A
C
Original Graph
C
A
CePS
Q: How to find hub for the black nodes?
9
A: Proximity! [Tong+ KDD 2006]
SCS CMU
Input
Query Graph
Output
Best-Effort
Pattern Match
Data Graph
CEO
SEC
Matching Subgraph
Accountant
Manager
Q: How to find matching subgraph?
A: Proximity![Tong+ KDD 2007]
10
SCS CMU
Challenge
• Graphs are evolving over time!
–New nodes/edges show up;
–Existing nodes/edges die out;
–Edge weights change…
Q: How to Generalize everything?
A: Track Proximity!
11
SCS CMU
Trend analysis on graph level
T. Sejnowski
Rank of Influential-ness
C. Koch
G.Hinton
M. Jordan
Year
12
SCS CMU
Roadmap
•
•
•
•
•
Motivation
Prox. On Static Graphs
Prox. On Time-Evolving Graphs
Experimental Results
Conclusion
13
SCS CMU
Random walk with restart
0.04
9
0.10
0.08
3
0.02
8
0.13
11
0.04
4
Query
Node 4
12
2
0.13
1
0.03
10
0.13
6
5
7
0.05
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
0.05
Nearby nodes, higher scores
More red, more relevant
Ranking vector
r4
14
SCS CMU
Computing RWR
ri  cWri  (1  c ) ei
 0.13 
0



 0.10 
1/3
 0.13 
1/3



 0.22 
1/3
 0.13 
0



 0.05 
0
 0.05   0.9   0



 0.08 
 0
 0.04 
0



 0.03 
0



 0.04 
 0
 0.02 
 0



nx1
Restart p
Adjacency matrix
Ranking vector
1/3 1/3 1/3 0 0 0
0
0 1/3 0 0 0 0 1/4
1/3 0 1/3 0 0
0 0
0 1/3 0 1/4 0 0 0
0 0 1/3 0 1/2 1/2 1/4
0
0
0 1/4 0 1/2 0
0
0
0 1/4 1/2 0 0
1/3 0
0 0
0 1/4 0 0 0
0 0 0 0 1/4
0
0
0
0 0 0 0
0
0
0
0 0 0 1/4
0
0
0
0 0 0
0
  0.13 
0


 
0 0 0 0  0.10 
0
0
0 0 0 0  0.13 


 
0 0 0 0  0.22 
1 
0
0 0 0 0  0.13 


 
0 0 0 0  0.05 
0
 0.1  


0 0 0 0
0.05
0


 
1/2 0 1/3 0  0.08 
0


0
0 1/3 0 0  0.04 
 
0
1/2 0 1/3 1/2  0.03 


 
0 1/3 0 1/2  0.04 
0


 0
0 1/3 1/3 0   0.02 
 
0 0 0
nxn
Starting vector
0
9
1
2
1
8
3
10
12
11
Query 4
5
6
7
nx1
15
SCS CMU
Q: Given query i, how to solve it?
?
0

1/3
1/3

1/3

 0
 0
 0.9  
0
0

 0

 0
0

0

Ranking vector


0 0 0 1/4 0 0 0 0 
0 0 0 0 0 0 0 0 

1/4 0 0 0 0 0 0 0 

0 1/2 1/2 1/4 0 0 0 0 
1/4 0 1/2 0 0 0 0 0 

1/4 1/2 0 0 0 0 0 0 
1/4 0 0 0 1/2 0 1/3 0 
0 0 0 1/4 0 1/3 0 0 

0 0 0 0 1/2 0 1/3 1/2 
0 0 0 1/4 0 1/3 0 1/2 

0 0 0 0 0 1/3 1/3 0 
1/3 1/3 1/3 0 0 0 0
0 1/3 0
1/3 0 1/3
0 1/3
0
0
0 1/3
0
0
0
0
0
0
1/3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Adjacency matrix
0 0 0
0
?
0
 
0
0
 
1 
 
0
0
 0.1   
0
0
 
0
 
0
0 
 
 0
 
Query
16
Ranking vector Starting vector
SCS CMU
authors
RWR on Bipartite Graph
Author-Conf. Matrix
Observation: n >> m!
Examples:
n
1. DBLP: 400k aus, 3.5k confs
2. NetFlix: 2.7M usrs,18k mvs
Conferences
m
17
SCS CMU
RWR on Skewed bipartite graphs
• Q: Given query i, how to solve it?


0 0 0 1/4 0 0 0… 0. 
0 0 0 0 0 0 0 . .0 

..
1/4 0 0 0 0 0 0 0 
… 
0 1/2 1/2 1/4 0 0 0 0 
..
1/4 0 1/2 0 0 0 0 r0 
.. 
1/4 1/2 0 0 0 0 0. …0 

1/4 0 0 0 1/2 0 1/3
..0 
0 0 0 1/4 0 1/3 0.. 0 

0 0 0 0 1/2 0 1/3 1/2 
0 0 0 1/4 0 1/3 0 1/2 

0 c
0 0 0 0 1/3 1/3 0 
1/3 1/3 1/3 0 0 0 0
0 1/3 0
1/3 0 1/3
0 1/3
0
0
0 1/3
0
0
0
0
0
0
1/3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0
0
0
A
….
..
..
…
..
..
.…
..
..
?
0

1/3
1/3

1/3

 0
 0
 0.9  
0
0

 0

 0
0

0

0
A
n
m
?
0
 
0
0
 
1 
 
0
0
 0.1   
0
0
 
0
 
0
0 
 
 0
 
m confs
n aus
18
SCS CMU
BB_Lin: Pre-Computation
[Tong+ 06]
2-step RWR for
Conferences
• Step 1: M
=
Ac
X
Ar
m conferences
All Conf-Conf
Prox. Scores
• Step 2:
• Cost:
  ( I  0.9  M )
1
O( m  m  E )
3
• Examples
– NetFlix: 1.5hr for pre-computation;
– DBLP: 1 few minutes
n authors
19
SCS CMU
BB_Lin: Pre-Computation
[Tong+ 06]
2-step RWR for
Conferences
• Step 1: M
=
Ac
X
Ar
m conferences
All Conf-Conf
Prox. Scores
• Step 2:
  ( I  0.9  M )
1
n authors
20
SCS CMU
BB_Lin: Pre-Computation
[Tong+ 06]
2-step RWR for
Conferences
• Step 1: M
=
Ac
X Ar
All Conf-Conf
Prox. Scores
• Step 2:
• Cost:
  ( I  0.9  M )
1
O( m  m  E )
3
• Examples
– NetFlix: 1.5hr for pre-computation;
– DBLP: 1 few minutes
mxm
Ac/Ar
21
E edges
SCS CMU
authors
BB_Lin: On-Line
Stage
Conferences
(Base) Case 1:
- Conf - Conf
Read out !
Ac/Ar
E edges
22
SCS CMU
authors
BB_Lin: On-Line
Stage
Conferences
Case 2:
- Au - Conf
1 matrix-vec!
Ac/Ar
E edges
23
SCS CMU
authors
BB_Lin: On-Line
Stage
Conferences
Case 3:
- Au - Au
2 matrix-vec!
Ac/Ar
E edges
24
SCS CMU
BB_Lin: Examples
• NetFlix dataset (2.7m user x 18k movies)
– 1.5hr for pre-computation;
– <1 sec for on-line
• DBLP dataset (400k authors x 3.5k confs)
– A few minutes for pre-computation
– <0.01 sec for on-line
25
SCS CMU
Roadmap
•
•
•
•
•
Motivation
Prox. On Static Graphs
Prox. On Time-Evolving Graphs
Experimental Results
Conclusion
26
SCS CMU
Challenges
• BB_Lin is good for skewed bipartite graphs
– for NetFlix (2.7M nodes and 100M edges)
– On-line cost for query: fraction of seconds
• w/ 1.5 hr pre-computation for m x m core matrix
• But…what if the graph is evolving over time
– New edges/nodes arrive; edge weights increase…
– On-line cost: 1.5hr itself becomes a part this!
27
SCS CMU
Q: How to update the core matrix?
t=0
  ( I  0.9  M )
~
t=1
~
1
O( m  m  E )
1
O( m  m  E )
  ( I  0.9  M )
3
3
?
28
SCS CMU
Update the core matrix
• Step 1:
~M
=
X Ar
Ac
• Step 2:
~
~
1
  ( I  0.9  M )
=
+
=
M +
X
Rank 2 update
X
2
O(m )
O( m  m  E )
3
29
SCS CMU
Update
: General Case
n authors
~M
=
Ac
X
Ar
m Conferences
• E’ edges changed
• Involves n’ authors, m’ confs.
• Observation
min(n ', m ')
E'
30
SCS CMU
Update
: General Case
• Observation: min(n ', m ')
– the rank of update is small!
– Real Example (DBLP Post)
n authors
E'
m Conferences
• 1258 time steps
• E’ up to ~20,000!
• min(n’,m’) <=132
• Our Algorithm
O(min(n ', m ')m  E ')
2
O( m  m  E )
3
31
SCS CMU
Roadmap
•
•
•
•
•
Motivation
Prox. On Static Graphs
Prox. On Time-Evolving Graphs
Experimental Results
Conclusion
32
SCS CMU
Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992
1997
2002
2007
Databases
Performance
Distributed Sys.
DBLP: (Au. x Conf.)
- 400k aus,
- 3.5k confs
- 20 yrs
Databases
Data Mining
34
SCS CMU
KDD’s Rank wrt. VLDB over years
Prox.
Rank
Data Mining and Databases
are more and more relavant!
Year
35
SCS CMU
10 most influential authors in
NIPS community up to each year
T. Sejnowski
M. Jordan
Author-paper bipartite graph from NIPS 1987-1999.
3k. 1740 papers, 2037 authors,
spreading over 13 years
37
SCS CMU
Fast-Single-Update
log(Time)
(Seconds)
176x speedup
40x speedup
Our method
Our method
38
Datasets
SCS CMU
Fast-Batch-Update
Time (Seconds)
Time (Seconds)
Our method
Our method
E’
Min (n’, m’)
15x speed-up on average!
39
SCS CMU
Conclusion
• Trends Analysis on Graph Level
– pTrack/cTrack
g
raph
Trends
• Scalable for evolving graphs
40
SCS CMU
Thank you!
www.cs.cmu.edu/~htong
41
Download