SCS CMU Proximity Tracking on TimeEvolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos Apr. 24-26, 2008, Atlanta SIAM Conference on Data Mining SCS CMU Graphs are everywhere! 2 SCS CMU Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level We are here! 3 SCS CMU Proximity on Graph: What? I 1 J 1 A 1 1 1 H 1 B 1 D 1 1 1 E G F a.k.a Relevance, Closeness, ‘Similarity’… 4 density SCS CMU 0.18 Link Prediction 0.16 0.14 0.12 0.1 Prox. Hist. for a set of deleted links 0.08 0.06 0.04 Prox (ij)+Prox (ji) 0.02 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 densityis effective to ‘deleted’ and absent edges! Prox. 0.25 0.2 Prox. Hist. for a set of absent links 0.15 0.1 Prox (ij)+Prox (ji) 0.05 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Q: How to predict the existence of the link? A: Proximity! [Liben-Nowell + 2003] 5 … … SCS CMU IJCAI Neighborhood Search on graphs Philip S. Yu KDD ICDM Ning Zhong SDM R. Ramakrishnan AAAI M. Jordan … NIPS … Conference Author Q: what is most related conference to ICDM? A: Proximity! [Sun+ ICDM2005]6 SCS CMU Example PKDD SDM PAKDD 0.008 0.009 0.007 0.005 KDD ICML 0.011 ICDM CIKM 0.005 0.004 0.004 ICDE 0.005 0.004 ECML SIGMOD DMKD 7 SCS CMU Region Automatic Image Caption Image Test Image Keyword Sea Sun Sky Wave Cat Forest Tiger Grass Q: How to assign keywords to the test image? 8 A: Proximity! [Pan+ 2004] SCS CMU Center-Piece Subgraph(CePS) Input Output B B CePS guy A C Original Graph C A CePS Q: How to find hub for the black nodes? 9 A: Proximity! [Tong+ KDD 2006] SCS CMU Input Query Graph Output Best-Effort Pattern Match Data Graph CEO SEC Matching Subgraph Accountant Manager Q: How to find matching subgraph? A: Proximity![Tong+ KDD 2007] 10 SCS CMU Challenge • Graphs are evolving over time! –New nodes/edges show up; –Existing nodes/edges die out; –Edge weights change… Q: How to Generalize everything? A: Track Proximity! 11 SCS CMU Trend analysis on graph level T. Sejnowski Rank of Influential-ness C. Koch G.Hinton M. Jordan Year 12 SCS CMU Roadmap • • • • • Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion 13 SCS CMU Random walk with restart 0.04 9 0.10 0.08 3 0.02 8 0.13 11 0.04 4 Query Node 4 12 2 0.13 1 0.03 10 0.13 6 5 7 0.05 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02 0.05 Nearby nodes, higher scores More red, more relevant Ranking vector r4 14 SCS CMU Computing RWR ri cWri (1 c ) ei 0.13 0 0.10 1/3 0.13 1/3 0.22 1/3 0.13 0 0.05 0 0.05 0.9 0 0.08 0 0.04 0 0.03 0 0.04 0 0.02 0 nx1 Restart p Adjacency matrix Ranking vector 1/3 1/3 1/3 0 0 0 0 0 1/3 0 0 0 0 1/4 1/3 0 1/3 0 0 0 0 0 1/3 0 1/4 0 0 0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 1/4 0 1/2 0 0 0 0 1/4 1/2 0 0 1/3 0 0 0 0 1/4 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0.13 0 0 0 0 0 0.10 0 0 0 0 0 0 0.13 0 0 0 0 0.22 1 0 0 0 0 0 0.13 0 0 0 0 0.05 0 0.1 0 0 0 0 0.05 0 1/2 0 1/3 0 0.08 0 0 0 1/3 0 0 0.04 0 1/2 0 1/3 1/2 0.03 0 1/3 0 1/2 0.04 0 0 0 1/3 1/3 0 0.02 0 0 0 nxn Starting vector 0 9 1 2 1 8 3 10 12 11 Query 4 5 6 7 nx1 15 SCS CMU Q: Given query i, how to solve it? ? 0 1/3 1/3 1/3 0 0 0.9 0 0 0 0 0 0 Ranking vector 0 0 0 1/4 0 0 0 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0 1/2 1/2 1/4 0 0 0 0 1/4 0 1/2 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 1/4 0 0 0 1/2 0 1/3 0 0 0 0 1/4 0 1/3 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 1/3 1/3 0 1/3 1/3 1/3 0 0 0 0 0 1/3 0 1/3 0 1/3 0 1/3 0 0 0 1/3 0 0 0 0 0 0 1/3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 0 0 0 0 ? 0 0 0 1 0 0 0.1 0 0 0 0 0 0 Query 16 Ranking vector Starting vector SCS CMU authors RWR on Bipartite Graph Author-Conf. Matrix Observation: n >> m! Examples: n 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs Conferences m 17 SCS CMU RWR on Skewed bipartite graphs • Q: Given query i, how to solve it? 0 0 0 1/4 0 0 0… 0. 0 0 0 0 0 0 0 . .0 .. 1/4 0 0 0 0 0 0 0 … 0 1/2 1/2 1/4 0 0 0 0 .. 1/4 0 1/2 0 0 0 0 r0 .. 1/4 1/2 0 0 0 0 0. …0 1/4 0 0 0 1/2 0 1/3 ..0 0 0 0 1/4 0 1/3 0.. 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 1/4 0 1/3 0 1/2 0 c 0 0 0 0 1/3 1/3 0 1/3 1/3 1/3 0 0 0 0 0 1/3 0 1/3 0 1/3 0 1/3 0 0 0 1/3 0 0 0 0 0 0 1/3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A …. .. .. … .. .. .… .. .. ? 0 1/3 1/3 1/3 0 0 0.9 0 0 0 0 0 0 0 A n m ? 0 0 0 1 0 0 0.1 0 0 0 0 0 0 m confs n aus 18 SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2-step RWR for Conferences • Step 1: M = Ac X Ar m conferences All Conf-Conf Prox. Scores • Step 2: • Cost: ( I 0.9 M ) 1 O( m m E ) 3 • Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes n authors 19 SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2-step RWR for Conferences • Step 1: M = Ac X Ar m conferences All Conf-Conf Prox. Scores • Step 2: ( I 0.9 M ) 1 n authors 20 SCS CMU BB_Lin: Pre-Computation [Tong+ 06] 2-step RWR for Conferences • Step 1: M = Ac X Ar All Conf-Conf Prox. Scores • Step 2: • Cost: ( I 0.9 M ) 1 O( m m E ) 3 • Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes mxm Ac/Ar 21 E edges SCS CMU authors BB_Lin: On-Line Stage Conferences (Base) Case 1: - Conf - Conf Read out ! Ac/Ar E edges 22 SCS CMU authors BB_Lin: On-Line Stage Conferences Case 2: - Au - Conf 1 matrix-vec! Ac/Ar E edges 23 SCS CMU authors BB_Lin: On-Line Stage Conferences Case 3: - Au - Au 2 matrix-vec! Ac/Ar E edges 24 SCS CMU BB_Lin: Examples • NetFlix dataset (2.7m user x 18k movies) – 1.5hr for pre-computation; – <1 sec for on-line • DBLP dataset (400k authors x 3.5k confs) – A few minutes for pre-computation – <0.01 sec for on-line 25 SCS CMU Roadmap • • • • • Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion 26 SCS CMU Challenges • BB_Lin is good for skewed bipartite graphs – for NetFlix (2.7M nodes and 100M edges) – On-line cost for query: fraction of seconds • w/ 1.5 hr pre-computation for m x m core matrix • But…what if the graph is evolving over time – New edges/nodes arrive; edge weights increase… – On-line cost: 1.5hr itself becomes a part this! 27 SCS CMU Q: How to update the core matrix? t=0 ( I 0.9 M ) ~ t=1 ~ 1 O( m m E ) 1 O( m m E ) ( I 0.9 M ) 3 3 ? 28 SCS CMU Update the core matrix • Step 1: ~M = X Ar Ac • Step 2: ~ ~ 1 ( I 0.9 M ) = + = M + X Rank 2 update X 2 O(m ) O( m m E ) 3 29 SCS CMU Update : General Case n authors ~M = Ac X Ar m Conferences • E’ edges changed • Involves n’ authors, m’ confs. • Observation min(n ', m ') E' 30 SCS CMU Update : General Case • Observation: min(n ', m ') – the rank of update is small! – Real Example (DBLP Post) n authors E' m Conferences • 1258 time steps • E’ up to ~20,000! • min(n’,m’) <=132 • Our Algorithm O(min(n ', m ')m E ') 2 O( m m E ) 3 31 SCS CMU Roadmap • • • • • Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion 32 SCS CMU Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB 1992 1997 2002 2007 Databases Performance Distributed Sys. DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs Databases Data Mining 34 SCS CMU KDD’s Rank wrt. VLDB over years Prox. Rank Data Mining and Databases are more and more relavant! Year 35 SCS CMU 10 most influential authors in NIPS community up to each year T. Sejnowski M. Jordan Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years 37 SCS CMU Fast-Single-Update log(Time) (Seconds) 176x speedup 40x speedup Our method Our method 38 Datasets SCS CMU Fast-Batch-Update Time (Seconds) Time (Seconds) Our method Our method E’ Min (n’, m’) 15x speed-up on average! 39 SCS CMU Conclusion • Trends Analysis on Graph Level – pTrack/cTrack g raph Trends • Scalable for evolving graphs 40 SCS CMU Thank you! www.cs.cmu.edu/~htong 41