CMU SCS Graph Analytics Workshop: Tools Christos Faloutsos CMU Graph Analytics wkshp C. Faloutsos (CMU) 1 CMU SCS Welcome! Tue We Thu 9:00-10:30 Tools Laplacians Parallelism 11:00-12:30 NELL Rich graphs Communities 1:30-3:00 Exercises Panel Scalability 3:30-5:00 Graph. models Posters Graph ‘Laws’ Reception Graph Analytics wkshp C. Faloutsos (CMU) 2 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 3 CMU SCS Graphs - why should we care? Food Web [Martinez ’91] >$10B revenue >0.5B users Internet Map [lumeta.com] Graph Analytics wkshp C. Faloutsos (CMU) 4 CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) T1 D1 ... ... DN TM • ‘NELL’: ‘merkel’ ‘chancellor’ ‘germany’ - <S><V><O> facts -> tensors • web: hyper-text graph • ... and more: Graph Analytics wkshp C. Faloutsos (CMU) 5 CMU SCS Graphs - why should we care? • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • .... • Any M:N relationship -> Graph • Any subject-verb-object construct: -> Graph/Tensor Graph Analytics wkshp C. Faloutsos (CMU) 6 CMU SCS Graphs and matrices • Closely related • Powerful tools from matrix algebra, for graph mining Graph Analytics wkshp C. Faloutsos (CMU) 7 CMU SCS Examples of Matrices: Graph - social network Peter John John Peter Mary Nick ... 0 5 ... ... ... Graph Analytics wkshp Mary 11 0 ... ... ... Nick 22 6 ... ... ... C. Faloutsos (CMU) ... ... ... ... 55 ... 7 ... ... ... ... 8 CMU SCS Examples of Matrices: Market basket • market basket as in Association Rules milk bread choc. wine ... John Peter Mary Nick ... Graph Analytics wkshp C. Faloutsos (CMU) 9 CMU SCS Examples of Matrices: Documents and terms data Paper#1 Paper#2 Paper#3 Paper#4 ... 13 5 ... ... ... Graph Analytics wkshp ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 10 CMU SCS Examples of Matrices: Authors and terms data John Peter Mary ... Nick ... ... ... Graph Analytics wkshp 13 5 ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 11 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 12 CMU SCS Node importance - Motivation: • Given a graph (eg., web pages containing the desirable query word) • Q: Which node is the most important? Graph Analytics wkshp C. Faloutsos (CMU) 13 CMU SCS Node importance - Motivation: • Given a graph (eg., web pages containing the desirable query word) • Q: Which node is the most important? • A1: HITS (SVD = Singular Value Decomposition) ‘I am important, • A2: eigenvector (PageRank) if my friends are important’ -> Fixed point / eigenvector Graph Analytics wkshp C. Faloutsos (CMU) 14 CMU SCS Node importance - motivation • SVD and eigenvector analysis: very closely related Graph Analytics wkshp C. Faloutsos (CMU) 15 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 16 CMU SCS Task 1 - SVD - Detailed outline • • • • • Motivation Definition - properties Interpretation Complexity Case Studies – HITS – PageRank Graph Analytics wkshp C. Faloutsos (CMU) 17 CMU SCS SVD - Motivation • problem #1: text - LSI: find ‘concepts’ • problem #2: compression / dim. reduction Graph Analytics wkshp C. Faloutsos (CMU) 18 CMU SCS SVD - Motivation • problem #1: text - LSI: find ‘concepts’ Graph Analytics wkshp C. Faloutsos (CMU) 19 CMU SCS SVD - Motivation • Customer-product, for recommendation system: vegetarians meat eaters Graph Analytics wkshp 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 C. Faloutsos (CMU) 20 CMU SCS SVD - Motivation • problem #2: compress / reduce dimensionality Graph Analytics wkshp C. Faloutsos (CMU) 21 CMU SCS Problem - specs • Visualize customers Graph Analytics wkshp C. Faloutsos (CMU) 22 CMU SCS SVD - Motivation Graph Analytics wkshp C. Faloutsos (CMU) 23 CMU SCS SVD - Motivation Graph Analytics wkshp C. Faloutsos (CMU) 24 CMU SCS Task 1 - SVD - Detailed outline • • • • • Motivation Definition - properties Interpretation Complexity Case Studies – HITS – PageRank Graph Analytics wkshp C. Faloutsos (CMU) 25 CMU SCS SVD - Definition • A = U L VT - example: Graph Analytics wkshp C. Faloutsos (CMU) 26 CMU SCS SVD - Definition A[n x m] = U[n x r] L [ r x r] (V[m x r])T • A: n x m matrix (eg., n documents, m terms) • U: n x r matrix (n documents, r concepts) • L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) • V: m x r matrix (m terms, r concepts) Graph Analytics wkshp C. Faloutsos (CMU) 27 CMU SCS SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U L VT , where • U, L, V: unique (*) • U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) – UT U = I; VT V = I (I: identity matrix) • L: singular are positive, and sorted in decreasing order Graph Analytics wkshp C. Faloutsos (CMU) 28 CMU SCS SVD - Example • A = U L VT - example: retrieval inf. lung brain data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 29 CMU SCS SVD - Example • A = U L VT - example: retrieval CS-concept inf. lung MD-concept brain data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 30 CMU SCS SVD - Example • A = U L VT - example: doc-to-concept similarity matrix retrieval CS-concept inf. MD-concept brain lung data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 31 CMU SCS SVD - Example • A = U L VT - example: retrieval inf. lung brain data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = ‘strength’ of CS-concept 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 32 CMU SCS SVD - Example • A = U L VT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 CS-concept x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 33 CMU SCS SVD - Example • A = U L VT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS MD 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 CS-concept x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 34 CMU SCS Task 1 - SVD - Detailed outline • • • • • • Motivation Definition - properties Interpretation Complexity Case studies Additional properties Graph Analytics wkshp C. Faloutsos (CMU) 35 CMU SCS SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: • U: document-to-concept similarity matrix • V: term-to-concept sim. matrix • L: its diagonal elements: ‘strength’ of each concept Graph Analytics wkshp C. Faloutsos (CMU) 36 CMU SCS SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is AT A? A: Q: A AT ? A: Graph Analytics wkshp C. Faloutsos (CMU) 37 CMU SCS SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is AT A? A: term-to-term ([m x m]) similarity matrix Q: A AT ? A: document-to-document ([n x n]) similarity matrix Graph Analytics wkshp ICDE’09 Copyright: C. Faloutsos Faloutsos, (CMU) Tong (2009) 2-38 38 CMU SCS SVD properties • V are the eigenvectors of the covariance matrix ATA • U are the eigenvectors of the Gram (innerproduct) matrix AAT Further reading: 1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002. Graph Analytics wkshp Copyright: C. Faloutsos Faloutsos, (CMU) Tong (2009) 2-39 39 2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005. CMU SCS SVD - Interpretation #2 • best axis to project on: (‘best’ = min sum of squares of projection errors) Graph Analytics wkshp C. Faloutsos (CMU) 40 CMU SCS SVD - Motivation Graph Analytics wkshp C. Faloutsos (CMU) 41 CMU SCS SVD - interpretation #2 SVD: gives best axis to project first singular vector v1 • minimum RMS error Graph Analytics wkshp C. Faloutsos (CMU) 42 CMU SCS SVD - Interpretation #2 Graph Analytics wkshp C. Faloutsos (CMU) 43 CMU SCS SVD - Interpretation #2 • A = U L VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x v1 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 44 CMU SCS SVD - Interpretation #2 • A = U L VT - example: variance (‘spread’) on the v1 axis 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 45 CMU SCS SVD - Interpretation #2 • A = U L VT - example: – U L gives the coordinates of the points in the projection axis 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 46 CMU SCS SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done? 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 47 CMU SCS SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done? • A: set the smallest singular values to zero: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 48 CMU SCS SVD - Interpretation #2 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 0 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 49 CMU SCS SVD - Interpretation #2 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 0 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 50 CMU SCS SVD - Interpretation #2 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 x C. Faloutsos (CMU) 9.64 x 0.58 0.58 0.58 0 0 51 CMU SCS SVD - Interpretation #2 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 ~ 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C. Faloutsos (CMU) 52 CMU SCS SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x C. Faloutsos (CMU) 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 53 CMU SCS SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = u1 u2 x l1 l2 x v1 v2 Graph Analytics wkshp C. Faloutsos (CMU) 54 CMU SCS SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = l1 u1 vT1 + C. Faloutsos (CMU) l2 u2 vT2 +... 55 CMU SCS SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 r terms = l1 u1 vT1 + nx1 l2 u2 vT2 +... 1xm C. Faloutsos (CMU) 56 CMU SCS SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = l1 u1 vT1 + l2 u2 vT2 +... assume: l1 >= l2 >= ... C. Faloutsos (CMU) 57 CMU SCS SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s) m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 Graph Analytics wkshp 0 0 0 0 2 3 1 = l1 u1 vT1 + l2 u2 vT2 +... assume: l1 >= l2 >= ... C. Faloutsos (CMU) 58 CMU SCS Pictorially: matrix form of SVD n n m A m VT U – Best rank-k approximation in L2 Graph Analytics wkshp C. Faloutsos (CMU) 59 CMU SCS Pictorially: Spectral form of SVD n m A 1u1v1 2u2v2 + – Best rank-k approximation in L2 Graph Analytics wkshp C. Faloutsos (CMU) 60 CMU SCS Task 1 - SVD - Detailed outline • Motivation • Definition - properties • Interpretation – #1: documents/terms/concepts – #2: dim. reduction – #3: picking non-zero, rectangular ‘blobs’ • Complexity • Case studies • Additional properties Graph Analytics wkshp C. Faloutsos (CMU) 61 CMU SCS SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 Graph Analytics wkshp = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x 9.64 0 0 5.29 C. Faloutsos (CMU) x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 62 CMU SCS SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 Graph Analytics wkshp = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 0 0.53 0.80 0.27 x 9.64 0 0 5.29 C. Faloutsos (CMU) x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 63 CMU SCS SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix = • ‘communities’ (bi-partite cores, here) 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 Graph Analytics wkshp Row 1 Col 1 Row 4 Col 3 Row 5 Col 4 Row 7 C. Faloutsos (CMU) 64 CMU SCS Task 1 - SVD - Detailed outline • • • • • Motivation Definition - properties Interpretation Complexity Case Studies – HITS – PageRank Graph Analytics wkshp C. Faloutsos (CMU) 65 CMU SCS SVD - Complexity • O( n * m * m) or O( n * n * m) (whichever is less) • less work, if we just want singular values • or if we want first k singular vectors • or if the matrix is sparse [Berry] • Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...) Graph Analytics wkshp C. Faloutsos (CMU) 66 CMU SCS SVD - conclusions so far • SVD: A= U L VT : unique (*) • U: document-to-concept similarities • V: term-to-concept similarities • L: strength of each concept • dim. reduction: keep the first few strongest singular values (80-90% of ‘energy’) – SVD: picks up linear correlations • SVD: picks up non-zero ‘blobs’ Graph Analytics wkshp C. Faloutsos (CMU) 67 CMU SCS Task 1 - SVD - Detailed outline • • • • • Motivation Definition - properties Interpretation Complexity Case Studies – HITS – PageRank Graph Analytics wkshp C. Faloutsos (CMU) 68 CMU SCS Kleinberg’s algo (HITS) Kleinberg, Jon (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. Graph Analytics wkshp C. Faloutsos (CMU) 69 CMU SCS Recall: problem dfn • Given a graph (eg., web pages containing the desirable query word) • Q: Which node is the most important? Graph Analytics wkshp C. Faloutsos (CMU) 70 CMU SCS Kleinberg’s algorithm • Problem dfn: given the web and a query • find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms Step 1: expand by one move forward and backward Graph Analytics wkshp C. Faloutsos (CMU) 71 CMU SCS Kleinberg’s algorithm • Step 1: expand by one move forward and backward Graph Analytics wkshp C. Faloutsos (CMU) 72 CMU SCS Kleinberg’s algorithm • on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to • give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubs Graph Analytics wkshp C. Faloutsos (CMU) authorities 73 CMU SCS Kleinberg’s algorithm observations • recursive definition! • each node (say, ‘i’-th node) has both an authoritativeness score ai and a hubness score hi Graph Analytics wkshp C. Faloutsos (CMU) 74 CMU SCS Kleinberg’s algorithm Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then: Graph Analytics wkshp C. Faloutsos (CMU) 75 CMU SCS Kleinberg’s algorithm Then: ai = h k + hl + hm k i l m Graph Analytics wkshp that is ai = Sum (hj) over all j that (j,i) edge exists or a = AT h C. Faloutsos (CMU) 76 CMU SCS Kleinberg’s algorithm i n p q Graph Analytics wkshp symmetrically, for the ‘hubness’: hi = a n + ap + aq that is hi = Sum (qj) over all j that (i,j) edge exists or h=Aa C. Faloutsos (CMU) 77 CMU SCS Kleinberg’s algorithm In conclusion, we want vectors h and a such that: h=Aa = T a=A h SVD properties: A [n x m] v1 [m x 1] = l1 u1 [n x 1] u1T A = l1 v1T Graph Analytics wkshp C. Faloutsos (CMU) 78 CMU SCS Kleinberg’s algorithm In short, the solutions to h=Aa a = AT h are the left- and right- singular-vectors of the adjacency matrix A. Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the singular-vectors? why?) Graph Analytics wkshp C. Faloutsos (CMU) 79 CMU SCS Kleinberg’s algorithm (Q: to which of all the singular-vectors? why?) A: to the ones of the strongest singular-value: (AT A ) k v’ ~ (constant) v1 Graph Analytics wkshp C. Faloutsos (CMU) 80 CMU SCS Kleinberg’s algorithm - results Eg., for the query ‘java’: 0.328 www.gamelan.com 0.251 java.sun.com 0.190 www.digitalfocus.com (“the java developer”) Graph Analytics wkshp C. Faloutsos (CMU) 81 CMU SCS Kleinberg’s algorithm - discussion • ‘authority’ score can be used to find ‘similar pages’ (how?) Graph Analytics wkshp C. Faloutsos (CMU) 82 CMU SCS Task 1 - SVD - Detailed outline • • • • • Motivation Definition - properties Interpretation Complexity Case Studies – HITS – PageRank Graph Analytics wkshp C. Faloutsos (CMU) 83 CMU SCS PageRank (google) •Brin, Sergey and Lawrence Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Larry Page Sergey Brin Graph Analytics wkshp C. Faloutsos (CMU) 84 CMU SCS Problem: PageRank Given a directed graph, find its most interesting/central node A node is important, if it is connected with important nodes (recursive, but OK!) Graph Analytics wkshp C. Faloutsos (CMU) 85 CMU SCS Problem: PageRank - solution Given a directed graph, find its most interesting/central node Proposed solution: Random walk; spot most ‘popular’ node (-> steady state prob. (ssp)) A node has high ssp, if it is connected with high ssp nodes (recursive, but OK!) Graph Analytics wkshp C. Faloutsos (CMU) 86 CMU SCS (Simplified) PageRank algorithm • Let A be the adjacency matrix; • let B be the transition matrix: transpose, column-normalized - then From To 2 1 4 5 Graph Analytics wkshp 3 B 1 1 1 1/2 1/2 C. Faloutsos (CMU) p1 p1 p2 p2 = 1/2 p3 1/2 p4 p4 p5 p5 p3 87 CMU SCS (Simplified) PageRank algorithm • Bp=p B p 1 2 1 4 5 Graph Analytics wkshp 3 1 1 1/2 1/2 C. Faloutsos (CMU) = p p1 p1 p2 p2 = 1/2 p3 1/2 p4 p4 p5 p5 p3 88 CMU SCS Definitions A D B Adjacency matrix (from-to) Degree matrix = (diag ( d1, d2, …, dn) ) Transition matrix: to-from, column normalized B = AT D-1 Graph Analytics wkshp C. Faloutsos (CMU) 89 CMU SCS (Simplified) PageRank algorithm • Bp=1*p • thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is column-normalized) • Why does such a p exist? – p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem] Graph Analytics wkshp C. Faloutsos (CMU) 90 CMU SCS (Simplified) PageRank algorithm • In short: imagine a particle randomly moving along the edges • compute its steady-state probabilities (ssp) Full version of algo: with occasional random jumps Why? To make the matrix irreducible Graph Analytics wkshp C. Faloutsos (CMU) 91 CMU SCS Full Algorithm • With probability 1-c, fly-out to a random node • Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1 Graph Analytics wkshp C. Faloutsos (CMU) 92 CMU SCS Alternative notation M Modified transition matrix M = c B + (1-c)/n 1 1T Then p=Mp That is: the steady state probabilities = PageRank scores form the first eigenvector of the ‘modified transition matrix’ Graph Analytics wkshp C. Faloutsos (CMU) 93 CMU SCS Parenthesis: intuition behind eigenvectors Graph Analytics wkshp C. Faloutsos (CMU) 94 CMU SCS Formal definition If A is a (n x n) square matrix (l , x) is an eigenvalue/eigenvector pair of A if Ax=lx CLOSELY related to singular values: Graph Analytics wkshp C. Faloutsos (CMU) 95 CMU SCS Property #1: Eigen- vs singular-values if B[n x m] = U[n x r] L [ r x r] (V[m x r])T then A = (BTB) is symmetric and C(4): BT B vi = li2 vi ie, v1 , v2 , ...: eigenvectors of A = (BTB) Graph Analytics wkshp C. Faloutsos (CMU) 96 CMU SCS Property #2 • If A[nxn] is a real, symmetric matrix • Then it has n real eigenvalues (if A is not symmetric, some eigenvalues may be complex) Graph Analytics wkshp C. Faloutsos (CMU) 97 CMU SCS Property #3 • If A[nxn] is a real, symmetric matrix • Then it has n real eigenvalues • And they agree with its n singular values, except possibly for the sign Graph Analytics wkshp C. Faloutsos (CMU) 98 CMU SCS Intuition • A as vector transformation x’ A 2 1 2 1 = x 1 3 1 0 x’ x 1 3 2 1 Graph Analytics wkshp C. Faloutsos (CMU) 99 CMU SCS Intuition • By defn., eigenvectors remain parallel to themselves (‘fixed points’) l1 v1 0.52 3.62 * 0.85 = Graph Analytics wkshp A 2 1 v1 1 3 0.52 0.85 C. Faloutsos (CMU) 100 CMU SCS Convergence • Usually, fast: Graph Analytics wkshp C. Faloutsos (CMU) 101 CMU SCS Convergence • Usually, fast: Graph Analytics wkshp C. Faloutsos (CMU) 102 CMU SCS Convergence • Usually, fast: • depends on ratio l2 l1 : l2 Graph Analytics wkshp l1 C. Faloutsos (CMU) 103 CMU SCS Kleinberg/google - conclusions SVD helps in graph analysis: hub/authority scores: strongest left- and rightsingular-vectors of the adjacency matrix random walk on a graph: steady state probabilities are given by the strongest eigenvector of the (modified) transition matrix Graph Analytics wkshp C. Faloutsos (CMU) 104 CMU SCS Conclusions • SVD: a valuable tool • given a document-term matrix, it finds ‘concepts’ (LSI) • ... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains) Graph Analytics wkshp C. Faloutsos (CMU) 105 CMU SCS Conclusions cont’d (We didn’t discuss/elaborate, but, SVD • ... can reduce dimensionality (KL) • ... and can find rules (PCA; RatioRules) • ... and can solve optimally over- and underconstraint linear systems (least squares / query feedbacks) Graph Analytics wkshp C. Faloutsos (CMU) 106 CMU SCS References • Berry, Michael: http://www.cs.utk.edu/~lsi/ • Brin, S. and L. Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Graph Analytics wkshp C. Faloutsos (CMU) 107 CMU SCS References • Christos Faloutsos, Searching Multimedia Databases by Content, Springer, 1996. (App. D) • Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. • I.T. Jolliffe Principal Component Analysis Springer, 2002 (2nd ed.) Graph Analytics wkshp C. Faloutsos (CMU) 108 CMU SCS References cont’d • Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. • Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press. www.nr.com Graph Analytics wkshp C. Faloutsos (CMU) 109 CMU SCS PART 2: Communities Graph Analytics wkshp C. Faloutsos (CMU) 110 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 111 CMU SCS Task 2 – Communities - Detailed outline • • • • Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations Graph Analytics wkshp C. Faloutsos (CMU) 112 CMU SCS Problem • Given a graph, and k • Break it into k (disjoint) communities Graph Analytics wkshp C. Faloutsos (CMU) 113 CMU SCS Problem • Given a graph, and k • Break it into k (disjoint) communities k=2 Graph Analytics wkshp C. Faloutsos (CMU) 114 CMU SCS Solution #1: METIS • Arguably, the best algorithm • Open source, at – http://www.cs.umn.edu/~metis • and *many* related papers, at same url • Main idea: – coarsen the graph; – partition; – un-coarsen Graph Analytics wkshp C. Faloutsos (CMU) 115 CMU SCS Solution #1: METIS • G. Karypis and V. Kumar. METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system. TR, Dept. of CS, Univ. of Minnesota, 1998. • <and many extensions> Graph Analytics wkshp C. Faloutsos (CMU) 116 CMU SCS Solution #2 (problem: hard clustering, k pieces) Spectral partitioning: • Consider the 2nd smallest eigenvector of the (normalized) Laplacian See details in ‘Task 7’, later Graph Analytics wkshp C. Faloutsos (CMU) 117 CMU SCS Solutions #3, … Many more ideas: • Clustering on the A2 (square of adjacency matrix) [Zhou, Woodruff, PODS’04] • Minimum cut / maximum flow [Flake+, KDD’00] • … Graph Analytics wkshp C. Faloutsos (CMU) 118 CMU SCS Task 2 – Communities - Detailed outline • • • • Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations Graph Analytics wkshp C. Faloutsos (CMU) 119 CMU SCS Cross-association Desiderata: Simultaneously discover row and column groups Fully Automatic: No “magic numbers” Scalable to large matrices Reference: Graph Analytics wkshp C. Faloutsos (CMU) 1. Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04 120 CMU SCS versus Column groups Graph Analytics wkshp Why is this better? Row groups Row groups What makes a cross-association “good”? Column groups C. Faloutsos (CMU) 121 CMU SCS versus Column groups Why is this better? Row groups Row groups What makes a cross-association “good”? Column groups simpler; easier to describe easier to compress! Graph Analytics wkshp C. Faloutsos (CMU) 122 CMU SCS What makes a cross-association “good”? Problem definition: given an encoding scheme • decide on the # of col. and row groups k and l • and reorder rows and columns, • to achieve best compression Graph Analytics wkshp C. Faloutsos (CMU) 123 CMU SCS Main Idea Good Compression Total Encoding Cost = Better Clustering Cost of describing size * H(x ) + Σi i i cross-associations Code Cost Description Cost Minimize the total cost (# bits) for lossless compression Graph Analytics wkshp C. Faloutsos (CMU) 124 CMU SCS Algorithm l = 5 col groups k = 5 row groups k=1, l=2 k=2, l=2 Graph Analytics wkshp k=2, l=3 k=3, l=3 C. Faloutsos (CMU) k=3, l=4 k=4, l=4 k=4, l=5 125 CMU SCS Experiments Documents “CLASSIC” • 3,893 documents • 4,303 words • 176,347 “dots” Words Combination of 3 sources: • MEDLINE (medical) • CISI (info. retrieval) • CRANFIELD (aerodynamics) Graph Analytics wkshp C. Faloutsos (CMU) 126 CMU SCS Documents Experiments Words “CLASSIC” graph of documents & Graph Analytics wkshp C. Faloutsos (CMU) words: k=15, l=19 127 CMU SCS Experiments insipidus, alveolar, aortic, death, prognosis, intravenous blood, disease, clinical, cell, tissue, patient MEDLINE (medical) “CLASSIC” graph of documents & Graph Analytics wkshp C. Faloutsos (CMU) words: k=15, l=19 128 CMU SCS Experiments providing, studying, records, development, students, rules abstract, notation, works, construct, bibliographies MEDLINE (medical) CISI (Information Retrieval) “CLASSIC” graph of documents & Graph Analytics wkshp C. Faloutsos (CMU) words: k=15, l=19 129 CMU SCS Experiments shape, nasa, leading, assumed, thin MEDLINE (medical) CISI (Information Retrieval) CRANFIELD (aerodynamics) “CLASSIC” graph of documents & Graph Analytics wkshp C. Faloutsos (CMU) words: k=15, l=19 130 CMU SCS Experiments paint, examination, fall, raise, leave, based MEDLINE (medical) CISI (Information Retrieval) CRANFIELD (aerodynamics) “CLASSIC” graph of documents & Graph Analytics wkshp C. Faloutsos (CMU) words: k=15, l=19 131 CMU SCS Algorithm Code for cross-associations (matlab): www.cs.cmu.edu/~deepay/mywww/software/CrossAssociations01-27-2005.tgz Variations and extensions: • ‘Autopart’ [Chakrabarti, PKDD’04] • www.cs.cmu.edu/~deepay Graph Analytics wkshp C. Faloutsos (CMU) 132 CMU SCS Algorithm • Hadoop implementation [ICDM’08] Spiros Papadimitriou, Jimeng Sun: DisCo: Distributed Co-clustering with MapReduce: AAnalytics Case Study 2008: Graph wkshp towards Petabyte-Scale C. Faloutsos (CMU) End-to-End Mining. ICDM 133 512-521 CMU SCS Task 2 – Communities - Detailed outline • • • • Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations Graph Analytics wkshp C. Faloutsos (CMU) 134 CMU SCS Observation #1 • Skewed degree distributions – there are nodes with huge degree (>O(10^4), in facebook/linkedIn popularity contests!) Graph Analytics wkshp C. Faloutsos (CMU) 135 CMU SCS Observation #2 • Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+’04], [Leskovec+,’08] Graph Analytics wkshp C. Faloutsos (CMU) 136 CMU SCS Observation #2 • Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+,’04], [Leskovec+,’08] ? Graph Analytics wkshp ? C. Faloutsos (CMU) 137 CMU SCS Jellyfish model [Tauro+] … A Simple Conceptual Model for the Internet Topology, L. Tauro, C. Palmer, G. Siganos, M. Faloutsos, Global Internet, November 25-29, 2001 Jellyfish: A Conceptual Model for the AS Internet Topology G. Siganos, Sudhir L Tauro,Graph M. Faloutsos, and Networks, Vol. 8, No. 3,138 pp 339Analytics wkshp J. of Communications C. Faloutsos (CMU) 350, Sept. 2006. CMU SCS Strange behavior of min cuts • ‘negative dimensionality’ (!) NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy Statistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K.(CMU) Lang, A. Dasgupta, M. Mahoney. Graph Analytics wkshp C. Faloutsos 139 WWW 2008. CMU SCS “Min-cut” plot • Do min-cuts recursively. log (mincut-size / #edges) Mincut size = sqrt(N) log (# edges) N nodes Graph Analytics wkshp C. Faloutsos (CMU) 140 CMU SCS “Min-cut” plot • Do min-cuts recursively. New min-cut log (mincut-size / #edges) log (# edges) N nodes Graph Analytics wkshp C. Faloutsos (CMU) 141 CMU SCS “Min-cut” plot • Do min-cuts recursively. New min-cut log (mincut-size / #edges) Slope = -0.5 log (# edges) N nodes Graph Analytics wkshp For a d-dimensional grid, the slope is -1/d C. Faloutsos (CMU) 142 CMU SCS “Min-cut” plot log (mincut-size / #edges) log (mincut-size / #edges) Slope = -1/d log (# edges) For a d-dimensional grid, the slope is -1/d Graph Analytics wkshp C. Faloutsos (CMU) log (# edges) For a random graph, the slope is 0 143 CMU SCS “Min-cut” plot • What does it look like for a real-world graph? log (mincut-size / #edges) ? log (# edges) Graph Analytics wkshp C. Faloutsos (CMU) 144 CMU SCS Experiments • Datasets: – Google Web Graph: 916,428 nodes and 5,105,039 edges – Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges – User Website Clickstream Graph: 222,704 nodes and 952,580 edges NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, Blandford, C. Faloutsos and G. Blelloch, in the SDM 145 2004 GraphD. Analytics wkshp C. Faloutsos (CMU) Workshop on Link Analysis, Counter-terrorism and Privacy CMU SCS Experiments log (mincut-size / #edges) • Used the METIS algorithm [Karypis, Kumar, 1995] Graph Analytics wkshp Slope~ -0.4 • Google Web graph • Values along the yaxis are averaged log (# edges) • “lip” for large edges • Slope of -0.4, corresponds to a 2.5dimensional grid! C. Faloutsos (CMU) 146 CMU SCS Experiments log (mincut-size / #edges) log (mincut-size / #edges) • Same results for other graphs too… Slope~ -0.57 log (# edges) Lucent Router graph Graph Analytics wkshp C. Faloutsos (CMU) Slope~ -0.45 log (# edges) Clickstream graph 147 CMU SCS Task 2 – Communities Conclusions – Practitioner’s guide METIS • Hard clustering – k pieces • Hard clustering – optimal # pieces Cross-associations ‘jellyfish’: • Observations no good cuts Graph Analytics wkshp C. Faloutsos (CMU) 148 CMU SCS PART 3: Tensors Graph Analytics wkshp C. Faloutsos (CMU) 149 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 150 CMU SCS Task 3 – Tensors - Detailed roadmap • Motivation • Definitions: PARAFAC • Case study: web mining Graph Analytics wkshp C. Faloutsos (CMU) 151 CMU SCS Examples of Matrices: Authors and terms data John Peter Mary ... Nick ... ... ... Graph Analytics wkshp 13 5 ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 152 CMU SCS But: if it changes over time?? • A: treat it as ‘tensor’ KDD’07 KDD’08 KDD’09 John Peter Mary Nick ... data 13 5 ... ... ... Graph Analytics wkshp ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 153 CMU SCS Motivation: Why tensors? • Q: what is a tensor? Graph Analytics wkshp C. Faloutsos (CMU) 154 CMU SCS Motivation: Why tensors? • A: N-D generalization of matrix: KDD’09 John Peter Mary Nick ... data 13 5 ... ... ... Graph Analytics wkshp ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 155 CMU SCS Motivation: Why tensors? • A: N-D generalization of matrix: KDD’07 KDD’08 KDD’09 John Peter Mary Nick ... data 13 5 ... ... ... Graph Analytics wkshp ... ... ... mining classif. 11 4 22 6 ... ... ... C. Faloutsos (CMU) tree ... ... ... ... 55 ... 7 ... ... ... ... 156 CMU SCS Tensors are useful for 3 or more modes Terminology: ‘mode’ (or ‘aspect’): Mode#3 data 13 5 Mode#2 ... ... ... Graph Analytics wkshp ... ... ... mining classif. 11 4 22 6 ... ... ... tree ... ... ... Mode (== aspect) #1 C. Faloutsos (CMU) ... 55 ... 7 ... ... ... ... 157 CMU SCS Notice • 3rd mode does not need to be time • we can have more than 3 modes Dest. port 125 ... 80 13 5 IP source ... ... ... Graph Analytics wkshp 11 4 ... ... ... 22 6 ... ... ... IP destination C. Faloutsos (CMU) ... ... ... 55 ... 7 ... ... ... ... 158 CMU SCS Background: Tensors • Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base “Eric Clapton plays guitar” “Barrack Obama is the president of U.S.” (48M) (26M) NELL (Never Ending Language Learner) data Nonzeros =144M (26M) Graph Analytics wkshp C. Faloutsos (CMU) 159 CMU SCS Task 3 – Tensors - Detailed roadmap • Motivation • Definitions: PARAFAC • Case study: web mining Graph Analytics wkshp C. Faloutsos (CMU) 160 CMU SCS Tensor basics • Multi-mode extensions of SVD – recall that: Graph Analytics wkshp C. Faloutsos (CMU) 161 CMU SCS Reminder: SVD n n m A m VT U – Best rank-k approximation in L2 Graph Analytics wkshp C. Faloutsos (CMU) 162 CMU SCS Reminder: SVD n m A 1u1v1 2u2v2 + – Best rank-k approximation in L2 Graph Analytics wkshp C. Faloutsos (CMU) 163 CMU SCS Extension to (>=)3 modes Graph Analytics wkshp C. Faloutsos (CMU) 164 CMU SCS Main points: • 2 major types of tensor decompositions: PARAFAC and Tucker (not examined here) • both can be solved with ``alternating least squares’’ (ALS) Graph Analytics wkshp C. Faloutsos (CMU) 165 CMU SCS Task 3 – Tensors - Detailed outline • Motivation • Definitions: PARAFAC • Case study: web mining Graph Analytics wkshp C. Faloutsos (CMU) 166 CMU SCS Discoveries: Problem Definition Most important concepts and synonyms? (48M) (26M) NELL (Never Ending Language Learner) data Nonzeros =144M (26M) Graph Analytics wkshp C. Faloutsos (CMU) 167 CMU SCS A1: Concept Discovery • Concept Discovery in Knowledge Base Graph Analytics wkshp C. Faloutsos (CMU) 168 CMU SCS A2.1: Concept Discovery Graph Analytics wkshp C. Faloutsos (CMU) 169 CMU SCS A2: Synonym Discovery • Synonym Discovery in Knowledge Base a1a2 … aR (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2 Graph Analytics wkshp C. Faloutsos (CMU) 170 CMU SCS A2: Synonym Discovery Graph Analytics wkshp C. Faloutsos (CMU) 171 CMU SCS GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Evangelos Papalexakis Abhay Harpale Christos Faloutsos KDD 2012 Graph Analytics wkshp C. Faloutsos (CMU) 172 CMU SCS Experiments • GigaTensor solves 100x larger problem (K) (J) GigaTensor (I) 100x Out of Memory Graph Analytics wkshp C. Faloutsos (CMU) Number of nonzero = I / 50 173 CMU SCS Conclusions • Real data may have multiple aspects (modes) • Tensors provide elegant theory and algorithms – PARAFAC (and Tucker): discover groups • GigaTensor: scales up (hadoop/PEGASUS) – www.cs.cmu.edu/~pegasus Graph Analytics wkshp C. Faloutsos (CMU) 174 CMU SCS References • T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages 242249, November 2005. • Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on Highdimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006 Graph Analytics wkshp C. Faloutsos (CMU) 175 CMU SCS Resources • See tutorial on tensors, KDD’07 (w/ Tamara Kolda and Jimeng Sun): www.cs.cmu.edu/~christos/TALKS/KDD-07-tutorial Graph Analytics wkshp C. Faloutsos (CMU) 176 CMU SCS Tensor tools - resources • Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/TensorToolbox • T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 2009 csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf • T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-Aspect Graph Analytics wkshp ICDE’09 Copyright: C. Faloutsos Faloutsos, (CMU) Tong (2009) 2-177 177 Data Mining (ICDM 2008) CMU SCS PART 4: Theory Graph Analytics wkshp C. Faloutsos (CMU) 178 CMU SCS Roadmap • • • • • • Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 179 CMU SCS Task 4 – Theory - Detailed roadmap • Adjacency matrix • Laplacian – Connected Components – Intuition: 2nd smallest eigenvalue -> ‘good cut’ Graph Analytics wkshp C. Faloutsos (CMU) 180 CMU SCS Adjacency matrix 4 1 2 A= 3 Graph Analytics wkshp C. Faloutsos (CMU) 181 CMU SCS Adjacency matrix 4 1 2 A= 3 Graph Analytics wkshp 1-step-away paths C. Faloutsos (CMU) 182 CMU SCS Adjacency matrix 4 1 2 Obvious extensions, for directed and/or weighted cases 3 Graph Analytics wkshp C. Faloutsos (CMU) 183 CMU SCS Task 4 – Theory - Detailed roadmap • Adjacency matrix • Laplacian – Connected Components – Intuition: 2nd smallest eigenvalue -> ‘good cut’ Graph Analytics wkshp C. Faloutsos (CMU) 184 CMU SCS Main upcoming result the second smallest eigenvector of the Laplacian (u2) gives a good cut: Nodes with positive scores should go to one group And the rest to the other Graph Analytics wkshp C. Faloutsos (CMU) 185 CMU SCS Laplacian 4 1 2 L= D-A= 3 Graph Analytics wkshp Diagonal matrix, dii=di C. Faloutsos (CMU) 186 CMU SCS Task 4 – Theory - Detailed roadmap • Adjacency matrix • Laplacian – Connected Components – Intuition: 2nd smallest eigenvalue -> ‘good cut’ Graph Analytics wkshp C. Faloutsos (CMU) 187 CMU SCS Connected Components • Lemma: Let G be a graph with n vertices and c connected components. If L is the Laplacian of G, then rank(L)= n-c. • Proof: see p.279, Godsil-Royle Graph Analytics wkshp C. Faloutsos (CMU) 188 CMU SCS Connected Components G(V,E) 1 2 L= 3 4 6 7 5 eig(L)= Graph Analytics wkshp C. Faloutsos (CMU) 189 CMU SCS Connected Components G(V,E) 1 2 L= 3 4 6 #zeros = #components 7 5 eig(L)= Graph Analytics wkshp C. Faloutsos (CMU) 190 CMU SCS Connected Components G(V,E) 1 2 3 0.01 4 L= 6 7 5 eig(L)= Graph Analytics wkshp C. Faloutsos (CMU) 191 CMU SCS Connected Components G(V,E) 1 2 3 0.01 4 L= 6 #zeros = #components 7 5 eig(L)= Graph Analytics wkshp C. Faloutsos (CMU) 192 CMU SCS Connected Components G(V,E) 1 2 3 0.01 4 L= 6 Indicates a “good cut” 7 5 eig(L)= Graph Analytics wkshp C. Faloutsos (CMU) 193 CMU SCS Task 4 – Theory - Detailed roadmap • Reminders • Adjacency matrix • Laplacian – Connected Components – Intuition: 2nd smallest eigenvalue -> ‘good cut’ Graph Analytics wkshp C. Faloutsos (CMU) 194 CMU SCS Example: Spectral Partitioning • K500 dumbbell graph • K500 http://en.wikipedia.org/wiki/File:Romeo_and_juliet_brown.jpg ? Montagues Romeo Juliet Capulets Graph Analytics wkshp C. Faloutsos (CMU) 195 CMU SCS Example: Spectral Partitioning • This is how adjacency matrix of B looks spy(B) Graph Analytics wkshp C. Faloutsos (CMU) 196 CMU SCS Example: Spectral Partitioning • 2nd eigenvector u2 of B: B u2 = l u2 u2,i score L = diag(sum(B))-B; [u v] = eigs(L,2,'SM'); plot(u(:,1),’x’) Not so much information yet… Graph Analytics wkshp C. Faloutsos (CMU) Node-id ‘i’ 197 CMU SCS Example: Spectral Partitioning • 2nd eigenvector after sorting on x2,i score x2,i score [ign ind] = sort(u(:,1)); plot(u(ind),'x') Graph Analytics wkshp C. Faloutsos (CMU) Node-id ‘i’ 198 CMU SCS Example: Spectral Partitioning • 2nd eigenvector after sorting on x2,i score x2,i score [ign ind] = sort(u(:,1)); plot(u(ind),'x') But now we see the two communities! Graph Analytics wkshp C. Faloutsos (CMU) Node-id ‘i’ 199 CMU SCS Example: Spectral Partitioning • This is how adjacency matrix of B looks now spy(B(ind,ind)) Graph Analytics wkshp C. Faloutsos (CMU) 200 CMU SCS Why λ2? OSCILLATE Each ball 1 unit of mass x1 Lx lx xn Dfn of eigenvector Matrix viewpoint: Graph Analytics wkshp C. Faloutsos (CMU) 201 CMU SCS Why λ2? OSCILLATE Each ball 1 unit of mass Lx lx Force due to neighbors Physics viewpoint: Graph Analytics wkshp x1 C. Faloutsos (CMU) xn displacement Hooke’s constant 202 CMU SCS Why λ2? OSCILLATE Each ball 1 unit of mass x1 Lx lx xn Node id Eigenvector value For the first eigenvector: All nodes: same displacement (= value) Graph Analytics wkshp C. Faloutsos (CMU) 203 CMU SCS Why λ2? OSCILLATE Each ball 1 unit of mass x1 Lx lx xn Node id Eigenvector value Graph Analytics wkshp C. Faloutsos (CMU) 204 CMU SCS Conclusions Spectrum tells us a lot about the graph: • Adjacency: #Paths • Laplacian: Sparse Cut Graph Analytics wkshp C. Faloutsos (CMU) 205 CMU SCS References • Fan R. K. Chung: Spectral Graph Theory (AMS) • Chris Godsil and Gordon Royle: Algebraic Graph Theory (Springer) • Bojan Mohar and Svatopluk Poljak: Eigenvalues in Combinatorial Optimization, IMA Preprint Series #939 • Gilbert Strang: Introduction to Applied Mathematics (Wellesley-Cambridge Press) Graph Analytics wkshp C. Faloutsos (CMU) 206 CMU SCS PART 5: Conclusions Graph Analytics wkshp C. Faloutsos (CMU) 207 CMU SCS Summary • • • • Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time Task 4: Spectral graph theory Graph Analytics wkshp C. Faloutsos (CMU) P9-208 CMU SCS Summary • • • • Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time Task 4: Spectral graph theory Graph Analytics wkshp ->SVD, PageRank, HITS -> METIS; ‘no good cuts’ -> Tensors -> Laplacians C. Faloutsos (CMU) P9-209 CMU SCS Acknowledgements Funding: IIS-0705359, IIS-0534205, DBI-0640543, CNS-0721736 Graph Analytics wkshp C. Faloutsos (CMU) P9-210 THANK YOU! Christos Faloutsos www.cs.cmu.edu/~christos www.cs.cmu.edu/~pegasus http://www.cs.cmu.edu/~epapalex/gmc/ Graph Analytics wkshp C. Faloutsos (CMU) P9-211