CMU SCS Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 CMU SCS Outline • • • • • • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Recommendations Task 4: Connection sub-graphs Task 5: Mining graphs over time Task 6: Virus/influence propagation Task 7: Spectral graph theory Task 8: Tera/peta graph mining: hadoop Observations – patterns of real graphs Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-2 CMU SCS Acknowledgement: Most of the foils in ‘Task 3’ are by Hanghang TONG www.cs.cmu.edu/~htong KDD'09 Faloutsos, Miller, Tsourakakis P3-3 CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-4 CMU SCS Motivation: Link Prediction A ? B i i i i Should we introduce Mr. A to Mr. B? KDD'09 Faloutsos, Miller, Tsourakakis P3-5 CMU SCS Motivation - recommendations customers Products / movies ‘smith’ ?? Terminator 2 KDD'09 Faloutsos, Miller, Tsourakakis P3-6 CMU SCS Answer: proximity • ‘yes’, if ‘A’ and ‘B’ are ‘close’ • ‘yes’, if ‘smith’ and ‘terminator 2’ are ‘close’ QUESTIONS in this part: - How to measure ‘closeness’/proximity? - How to do it quickly? - What else can we do, given proximity scores? KDD'09 Faloutsos, Miller, Tsourakakis P3-7 CMU SCS How close is ‘A’ to ‘B’? I 1 J 1 A 1 1 1 H 1 B 1 D 1 1 1 E G F Faloutsos, Miller, Tsourakakis a.k.a Relevance, Closeness, ‘Similarity’… KDD'09 P3-8 CMU SCS Why is it useful? • Recommendation And many more • Image captioning [Pan+] • Conn. / CenterPiece subgraphs [Faloutsos+], [Tong+], [Koren+] and • • • • • • • Link prediction [Liben-Nowell+], [Tong+] Ranking [Haveliwala], [Chakrabarti+] Email Management [Minkov+] Neighborhood Formulation [Sun+] Pattern matching [Tong+] Collaborative Filtering [Fouss+] … KDD'09 Faloutsos, Miller, Tsourakakis P3-9 CMU SCS Region Automatic Image Captioning Image Test Image Keyword Sea Sun Sky Wave Cat Forest Tiger Grass Q: How to assign keywords to the test image? KDD'09 Faloutsos, Miller, A:Tsourakakis Proximity! [Pan+ P3-10 2004] CMU SCS Center-Piece Subgraph(CePS) Input Output B B CePS guy C A Original Graph KDD'09 C A CePS Q: How to find hub for the black nodes? Faloutsos, Miller, A:Tsourakakis Proximity! [Tong+ KDD P3-112006] CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-12 CMU SCS How close is ‘A’ to ‘B’? I Should be close, if they have - many, - short - ‘heavy’ paths 1 J 1 A 1 1 1 H 1 B 1 D 1 1 1 E G F KDD'09 Faloutsos, Miller, Tsourakakis P3-13 CMU SCS Some ``bad’’ proximities Why not shortest path? A 1 D 1 B A 1 D 1 B 1 1 1 E G F A: ‘pizza delivery guy’ problem KDD'09 Faloutsos, Miller, Tsourakakis P3-14 Some ``bad’’ proximities CMU SCS Why not max. netflow? A A 1 1 D D 1 1 E B 1 B A: No penalty for long paths KDD'09 Faloutsos, Miller, Tsourakakis P3-15 CMU SCS What is a ``good’’ Proximity? I 1 J 1 A 1 1 1 H 1 1 D … 1 1 1 E G F B • Multiple Connections • Quality of connection •Direct & In-directed Conns •Length, Degree, Weight… KDD'09 Faloutsos, Miller, Tsourakakis P3-16 CMU SCS Random walk with restart 10 9 12 2 8 1 11 3 4 6 5 [Haveliwala’02] 7 KDD'09 Faloutsos, Miller, Tsourakakis P3-17 CMU SCS Random walk with restart 0.04 9 0.10 12 2 0.13 1 0.08 3 0.02 8 0.13 11 0.04 4 0.13 6 5 7 0.05 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02 0.05 Nearby nodes, higher scores More red, more relevant KDD'09 Node 4 0.03 10 Faloutsos, Miller, Tsourakakis Ranking vector r4 P3-18 CMU SCS Why RWR is a good score? j 1 Q ( I cW ) Q(i, j ) ri , j i W : adjacency matrix. c: damping factor Qc W 2 c W 2 c 3 all paths from i to j with length 1 all paths from i to j with length 2 KDD'09 Faloutsos, Miller, Tsourakakis W 3 ... all paths from i to j with length 3 P3-19 CMU SCS Detailed outline • Problem dfn and motivation • Solution: Random walk with restarts – variants • • • • Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-20 CMU SCS Variant: escape probability • Define Random Walk (RW) on the graph • Esc_Prob(CMUParis) – Prob (starting at CMU, reaches Paris before returning to CMU) CMU KDD'09 the remaining graph Faloutsos, Miller, Tsourakakis Paris Esc_Prob = Pr (smile before cry) P3-21 CMU SCS Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+] – SimRank [Jeh+] • Equivalence of Random Walks – Electric Networks: • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – Spring Systems • Katz [Katz], [Huang+], [Scholkopf+] • Matrix-Forest-based Alg [Chobotarev+] KDD'09 Faloutsos, Miller, Tsourakakis P3-22 CMU SCS Other Variants • Other measure by RWs – Community Time/Hitting Time [Fouss+] – SimRank [Jeh+] • Equivalence of Random Walks All– are “related Electric Networks: to” or “similar to” • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] random walk with restart! – Spring Systems • Katz [Katz], [Huang+], [Scholkopf+] • Matrix-Forest-based Alg [Chobotarev+] KDD'09 Faloutsos, Miller, Tsourakakis P3-23 CMU SCS Map of proximity measurements RWR Norma lize Katz Regularized Un-constrained Quad Opt. 4 ssp decides 1 esc_prob Esc_Prob + Sink Hitting Time/ Commute Time X out-degree Effective Conductance relax Harmonic Func. Constrained Quad Opt. “voltage = position” String System KDD'09 Faloutsos, Miller,Models Tsourakakis Physical P3-24 Mathematic Tools CMU SCS Notice: Asymmetry (even in undirected graphs) C B C-> A : high A-> C: low A E KDD'09 D Faloutsos, Miller, Tsourakakis P3-25 CMU SCS Summary of Proximity Definitions • Goal: Summarize multiple relationships • Solutions – Basic: Random Walk with Restarts • [Haweliwala’02] [Pan+ 2004][Sun+ 2006][Tong+ 2006] – Properties: Asymmetry • [Koren+ 2006][Tong+ 2007] [Tong+ 2008] – Variants: Esc_Prob and many others. • [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007] KDD'09 Faloutsos, Miller, Tsourakakis P3-26 CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-27 CMU SCS Preliminary: Sherman–Morrison Lemma If: vT x A A KDD'09 = = A + A + Faloutsos, Miller, Tsourakakis u +3 P3-28 CMU SCS Preliminary: Sherman–Morrison Lemma If: vT x A = A + u Then: 1 1 A u v A A (A u v ) A T 1 1 v A u 1 KDD'09 T 1 1 Faloutsos, Miller, Tsourakakis T P3-29 CMU SCS Sherman – Morrison Lemma – intuition: • Given a small perturbation on a matrix A -> A’ • We can quickly update its inverse KDD'09 Faloutsos, Miller, Tsourakakis P3-30 CMU SCS SM: The block-form 1 A1 A1B( D CA1B)1 CA1 A B C D 1 1 1 ( D CA B) CA A B C D A1B( D CA1B)1 1 1 ( D CA B) Or… 1 ( A BD 1C ) 1 A B C D 1 1 1 D C ( A BD C ) A1B( D CA1B ) 1 1 1 ( D CA B) And many other variants… Also known as Woodbury Identity KDD'09 Faloutsos, Miller, Tsourakakis P3-31 CMU SCS SM Lemma: Applications • RLS (Recursive least squares) – and almost any algorithm in time series! • • • • Kalman filtering Incremental matrix decomposition … … and all the fast solutions we will introduce! KDD'09 Faloutsos, Miller, Tsourakakis P3-32 CMU SCS Reminder: PageRank • With probability 1-c, fly-out to a random node • Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1 KDD'09 Faloutsos, Miller, Tsourakakis P3-33 CMU SCS p = c B p + (1-c)/n 1 ri cWri (1 c ) ei Ranking vector KDD'09 Adjacency matrix Restart p Faloutsos, Miller, Tsourakakis The only difference Starting vector P3-34 CMU SCS p = c B p + (1-c)/n 1 Computing RWR ri cWri (1 c ) ei Adjacency matrix Ranking vector 0.13 0 0.10 1/3 0.13 1/3 0.22 1/3 0.13 0 0.05 0 0.05 0.9 0 0.08 0 0.04 0 0.03 0 0.04 0 0.02 0 nx1 KDD'09 1/3 1/3 1/3 0 0 0 0 0 1/3 0 0 0 0 1/4 1/3 0 1/3 0 0 0 0 0 1/3 0 1/4 0 0 0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 1/4 0 1/2 0 0 0 0 1/4 1/2 0 0 1/3 0 0 1/4 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 Restart p 0 0.13 0 0 0 0 0 0.10 0 0 0 0 0 0 0.13 0 0 0 0 0.22 1 0 0 0 0 0 0.13 0 0 0 0 0.05 0 0.1 0 0 0 0 0.05 0 1/2 0 1/3 0 0.08 0 0 0 1/3 0 0 0.04 0 1/2 0 1/3 1/2 0.03 0 1/3 0 1/2 0.04 0 0 0 1/3 1/3 0 0.02 Starting vector 0 0 0 nxn 10 9 1 2 1 12 8 3 11 4 5 6 7 nx1 Faloutsos, Miller, Tsourakakis P3-35 CMU SCS Q: Given query i, how to solve it? ? 0 1/3 1/3 1/3 0 0 0.9 0 0 0 0 0 0 Ranking vector KDD'09 0 0 0 0 1/4 0 0 0 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0 1/2 1/2 1/4 0 0 0 0 1/4 0 1/2 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 1/4 0 0 0 1/2 0 1/3 0 0 0 0 1/4 0 1/3 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 1/3 1/3 0 1/3 1/3 1/3 0 0 0 0 0 1/3 0 1/3 0 1/3 0 1/3 0 0 0 1/3 0 0 0 0 0 0 1/3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 0 0 0 ? 0 0 0 1 0 0 0.1 0 0 0 0 0 0 Query Ranking vector Starting vector Faloutsos, Miller, Tsourakakis P3-36 CMU SCS OntheFly: 0.13 0.12 0.3 0.16 0.19 0.14 0 0.18 0 0.09 0.13 0.10 1/3 0.16 1/3 0.13 0.12 0.3 0.19 0.14 0.22 0.35 0.1 0.18 0.26 0.21 1/3 0.13 0.03 0.3 0.18 0.10 0.15 0 0 0.05 0.07 0 0.04 0.06 0.9 0.07 0 0.04 0.06 0 0.05 0.07 0 0 0.06 0.08 0.07 0 0 0 0.04 0.01 0.02 00 00.03 0 0.01 0.04 0 00 0.01 0.02 0 0.01 0.02 0 0 0 ri 0 0 0 1/4 0 0 0 0 0 0 0 0 0 0 0 0 1/4 0 0 0 0 0 0 0 0 1/2 1/2 1/4 0 0 0 0 1/4 0 1/2 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 1/4 0 0 0 1/2 0 1/3 0 0 0 0 1/4 0 1/3 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 1/3 1/3 0 1/3 1/3 1/3 0 0 0 0 0 1/3 0 1/3 0 1/3 0 1/3 0 0 0 0 0 1/3 0 0 0 0 1/3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.13 0 0 0.10 0 0.13 0 0 1 0.22 1 0 0 0.13 0 0.05 0 0.1 0 0 0.05 0 0.08 0 0 0 0.04 0 0 0.03 0 0.04 0 0 0.02 0 0.04 0.10 10 0.03 9 10 9 12 12 0.08 0.02 88 11 11 22 1 1 3 30.13 44 5 5 660.05 0.13 77 0.13 0.04 0.05 ri No pre-computation/ light storage Slow on-line response KDD'09 Faloutsos, Miller, Tsourakakis O(mE) P3-37 CMU SCS PreCompute r10.20r20.13r30.14r40.13r50.68 r60.56 r70.56 r80.63 r90.44 r10 r11 r12 0.35 0.39 0.34 R: 0.28 0.14 0.13 0.09 0.03 0.03 0.08 0.03 0.04 0.04 0.02 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 0.45 0.33 0.32 0.56 0.22 0.22 1.13 0.79 1.80 1.72 2.05 0.04 0.13 11 99 10 10 0.03 1212 0.08 88 0.02 11 11 0.10 22 3 0.13 0.04 44 5 0.13 66 77 0.05 0.05 cxQ Q KDD'09 Faloutsos, Miller, Tsourakakis P3-38 CMU SCS PreCompute: 0.13 2.20 0.10 1.28 0.13 1.43 0.22 1.29 0.91 0.13 0.05 0.37 0.05 0.37 0.08 0.84 0.04 0.29 0.03 0.35 0.04 0.39 0.02 0.22 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05 0.04 0.13 11 99 10 10 0.03 1212 0.08 88 0.02 11 11 0.10 22 3 0.13 0.04 44 5 0.13 66 77 0.05 0.05 Fast on-line response KDD'09 Heavy pre-computation/storage cost 3 )Miller, Tsourakakis Faloutsos, O(n O(n 2 ) P3-39 CMU SCS Q: How to Balance? On-line Off-line KDD'09 Faloutsos, Miller, Tsourakakis P3-40 CMU SCS How to balance? Idea (‘B-Lin’) • Break into communities • Pre-compute all, within a community • Adjust (with S.M.) for ‘bridge edges’ H. Tong, C. Faloutsos, & J.Y. Pan. Fast Random Walk with KDD'09 Faloutsos, Miller, Tsourakakis P3-41 Restart and Its Applications. ICDM, 613-622, 2006. CMU SCS B_Lin: Basic Idea [Tong+ ICDM 2006] 9 Find Community 2 1 8 3 10 12 11 4 9 0.04 7 12 2 1 6 5 10 8 3 0.13 1 11 0.10 2 3 10 9 0.13 11 5 5 6 0.13 7 9 1 2 8 3 10 12 6 7 0.05 0.05 Combine 11 4 FixKDD'09 the remaining 5 6 Faloutsos,7Miller, Tsourakakis 0.02 0.04 4 4 12 0.08 8 0.03 P3-42 CMU SCS B_Lin: details ~ W ~ ~ + ~ W 1: within community KDD'09 Faloutsos, Miller, Tsourakakis Cross-community P3-43 CMU SCS B_Lin: details -1 -1 ~ ~ ~ I – c W1 – cUSV I – cW ~ Easy to be inverted LRA difference SM Lemma! KDD'09 Faloutsos, Miller, Tsourakakis P3-44 CMU SCS B_Lin: summary • Pre-Computational Stage • Q: Efficiently compute and store Q • A: A few small, instead of ONE BIG, matrices inversions • On-Line Stage • Q: Efficiently recover one column of Q • A: A few, instead of MANY, matrix-vector multiplications KDD'09 Faloutsos, Miller, Tsourakakis P3-45 CMU SCS Query Time vs. Pre-Compute Time Log Query Time •Quality: 90%+ •On-line: •Up to 150x speedup •Pre-computation: •Two orders saving Log Pre-compute Time KDD'09 Faloutsos, Miller, Tsourakakis P3-46 CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-47 CMU SCS gCaP: Automatic Image Caption • Q … { Sea Sun Sky Wave} { Cat Forest Grass Tiger } A: Proximity! [Pan+ KDD2004] {?, ?, ?,} KDD'09 Faloutsos, Miller, Tsourakakis P3-48 CMU SCS Region Image Test Image Sea Sun KDD'09 Sky Wave Cat Forest Tiger Grass Keyword Faloutsos, Miller, Tsourakakis P3-49 CMU SCS Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun KDD'09 Sky Wave Cat Forest Tiger Keyword Faloutsos, Miller, Tsourakakis Grass P3-50 CMU SCS C-DEM (Screen-shot) KDD'09 Faloutsos, Miller, Tsourakakis P3-51 CMU SCS C-DEM: Multi-Modal Query System for Drosophila Embryo Databases [Fan+ VLDB 2008] KDD'09 Faloutsos, Miller, Tsourakakis P3-52 CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-53 CMU SCS authors RWR on Bipartite Graph Author-Conf. Matrix Observation: n >> m! n Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs Conferences KDD'09 m Faloutsos, Miller, Tsourakakis P3-54 CMU SCS RWR on Skewed bipartite graphs • Q: Given query i, how to solve it? m confs KDD'09 0 0 0 1/4 0 0 0… 0. 0 0 0 0 0 0 0 . .0 .. 1/4 0 0 0 0 0 0 0 … 0 1/2 1/2 1/4 0 0 0 0 .. 1/4 0 1/2 0 0 0 0 r0 .. 1/4 1/2 0 0 0 0 0. …0 1/4 0 0 0 1/2 0 1/3. . 0 0 0 0 1/4 0 1/3 0 .. 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 1/4 0 1/3 0 1/2 0 c 0 0 0 0 1/3 1/3 0 1/3 1/3 1/3 0 0 0 0 0 1/3 0 1/3 0 1/3 0 1/3 0 0 0 1/3 0 0 0 0 0 0 1/3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A …. .. .. … .. .. .… .. .. ? 0 1/3 1/3 1/3 0 0 0.9 0 0 0 0 0 0 0 A n ? Faloutsos, Miller, Tsourakakis m 0 0 0 1 0 0 0.1 0 0 0 0 0 0 n aus P3-55 CMU SCS Idea: • Pre-compute the smallest, m x m matrix • Use it to compute the rest proximities, on the fly H. Tong, S. Papadimitriou, P.S. YuMiller, & C.Tsourakakis Faloutsos. Proximity Tracking on KDD'09 Faloutsos, P3-56 Time-Evolving Bipartite Graphs. SDM 2008. CMU SCS BB_Lin: Examples Dataset Off-Line Cost On-Line Cost DBLP a few minutes frac. of sec. NetFlix 1.5 hours <0.01 sec. 400k authors x 3.5k conf.s KDD'09 2.7m user x 18k movies Faloutsos, Miller, Tsourakakis P3-57 CMU SCS Detailed outline • • • • • • Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions KDD'09 Faloutsos, Miller, Tsourakakis P3-58 CMU SCS Problem: update E’ edges changed Involves n’ authors, m’ confs. KDD'09 Faloutsos, Miller, Tsourakakis n authors m Conferences P3-59 CMU SCS Solution: • Use Sherman-Morrison to quickly update the inverse matrix KDD'09 Faloutsos, Miller, Tsourakakis P3-60 CMU SCS Fast-Single-Update log(Time) (Seconds) 176x speedup 40x speedup Our method Our method KDD'09 Faloutsos, Miller, Tsourakakis P3-61 Datasets CMU SCS pTrack: Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB 1992 1997 2002 2007 Databases Performance Distributed Sys. KDD'09 DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs Faloutsos, Miller, Tsourakakis Databases Data Mining P3-62 CMU SCS pTrack: Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB 1992 1997 2002 2007 Databases Performance Distributed Sys. KDD'09 DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs Faloutsos, Miller, Tsourakakis Databases Data Mining P3-63 CMU SCS KDD’s Rank wrt. VLDB over years Prox. Rank Data Mining and Databases are getting closer & closer Year KDD'09 Faloutsos, Miller, Tsourakakis P3-64 CMU SCS cTrack:10 most influential authors in NIPS community up to each year T. Sejnowski M. Jordan Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years KDD'09 Faloutsos, Miller, Tsourakakis P3-65 CMU SCS Conclusions - Take-home messages • Proximity Definitions – RWR – and a lot of variants • Computation – Sherman–Morrison Lemma – Fast Incremental Computation • Applications – Recommendations; auto-captioning; tracking – Center-piece Subgraphs (next) – E-mail management; anomaly detection, … KDD'09 Faloutsos, Miller, Tsourakakis P3-66 CMU SCS References • L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library. • T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002 • J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004. KDD'09 Faloutsos, Miller, Tsourakakis P3-67 CMU SCS References • C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004. • J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005. • W. Cohen. (2007) Graph Walks and Graphical Models. Draft. KDD'09 Faloutsos, Miller, Tsourakakis P3-68 CMU SCS References • P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York. • Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006. • A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006. KDD'09 Faloutsos, Miller, Tsourakakis P3-69 CMU SCS References • S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007. • F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007. KDD'09 Faloutsos, Miller, Tsourakakis P3-70 CMU SCS References • H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404-413, 2006. • H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613622, 2006. • H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007. KDD'09 Faloutsos, Miller, Tsourakakis P3-71 CMU SCS References • H. Tong, B. Gallagher, C. Faloutsos, & T. EliassiRad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007. • H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008. KDD'09 Faloutsos, Miller, Tsourakakis P3-72 CMU SCS References • B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008 • H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex Time-Stamped Events CIKM 08 • H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. ICDM 2008 KDD'09 Faloutsos, Miller, Tsourakakis P3-73 CMU SCS Resources • www.cs.cmu.edu/~htong/soft.htm For software, papers, and ppt of presentations • www.cs.cmu.edu/~htong/tut/cikm2008/cikm_tutor ial.html For the CIKM’08 tutorial on graphs and proximity Again, thanks to Hanghang TONG for permission to use his foils in this part KDD'09 Faloutsos, Miller, Tsourakakis P3-74