Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL

Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN 1 Overview 2     Introduction & Motivation Graph cut criterion  Min-cut  Normalized-cut Non-overlapping community detection  Spectral clustering  Deep auto-encoder Overlapping community detection  BigCLAM algorithm   1 Introduction Objective Intro to Analysis of Large Graphs KIM HYEONG CHEOL Introduction 4  What is the graph?  Definition  An   ordered pair G = (V, E) A set V of vertices A set E of edges  A line of connection between two vertices  2-elements subsets of V  Types  Undirected graph, directed graph, mixed graph, multigraph, weighted graph and so on Introduction 5  Undirected graph  Edges have no orientation  Edge (x,y) = Edge (y,x)  The maximum number of edges : n(n-1)/2  All pair of vertices are connected to each other Undirected graph G = (V, E) V : {1,2,3,4,5,6} E : {E(1,2), E(2,3), E(1,5), E(2,5), E(4,5) E(3,4), E(4,6)} Introduction 6  The undirected large graph E.g) Social graph A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/ Graph of Harry potter fanfiction Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/ Introduction 7  The undirected large graph E.g) Social graph Graph of Harry potter fanfiction Q : What do these large graphs present? A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/ Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/ Motivation 8  Social graph : How can you feel? VS A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/ Motivation 9  Graph of Harry potter fanfiction : How can you feel? VS Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/ Motivation 10  If we can partition, we can use it for analysis of graph as below Motivation 11  Graph partition & community detection Motivation 12  Graph partition & community detection Motivation 13  Graph partition & community detection Partition Community Motivation 14  Graph partition & community detection Partition Community Q : How can we find the partitions?   2 Minimum-cut Normalized-cut Criterion : Graph partitioning KIM HYEONG CHEOL Criterion : Basic principle 16 Graph partitioning : A & B  A Basic principle for graph partitioning  Minimize the number of between-group connections  Maximize the number of within-group connections Criterion : Min-cut VS N-cut 17  A Basic principle for graph partitioning  Minimize the number of between-group connections  Maximize the number of within-group connections Minimum-Cut vs Normalized-Cut Min-cut N-cut Minimize: between group connections  Maximize : within-group connections X   Mathematical expression : Cut (A,B) 18  For considering between-group Mathematical expression : Vol (A) 19  For considering within-group vol (A) = 5 vol (B) = 5 Criterion : Min-cut 20  Minimize the number of between-group connections  minA,B cut(A,B) A B Cut(A,B) = 1 -> Minimum value Criterion : Min-cut 21 A B Cut(A,B) = 1 A But, it looks more balanced… B How? Criterion : N-cut 22   Minimize the number of between-group connections Maximize the number of within-group connections If we define ncut(A,B) as below, -> The minimum value of ncut(A,B) will produces more balanced partitions because it consider both principles Methodology 23 A B VS Cut(A,B) = 1 𝟏 ncut(A,B) = 𝟐𝟔 + 𝟏 𝟏 = 1.038.. A B Cut(A,B) = 2 𝟐 ncut(A,B) = 𝟏𝟖 + 𝟐 𝟏𝟏 = 0.292.. Summary 24   What is the undirected large graph? How can we get insight from the undirected large graph?  Graph  Partition & Community detection What were the methodology for good graph partition?  Min-cut  Normalized-cut   3 Spectral Clustering Deep GraphEncoder Non-overlapping community detection: Waleed Abdulwahab Yahya Al-Gobi Finding Clusters 26 Nodes Nodes Network   How to identify such structure? How to spilt the graph into two pieces? Adjacency Matrix Spectral Clustering Algorithm 27  Three basic stages:  1) Pre-processing  Construct  2) a matrix representation of the graph Decomposition  Compute eigenvalues ( ) and eigenvectors (x) of the matrix  Focus is about ( ) and it corresponding 2  3) . Grouping  Assign points to two or more clusters, based on the new representation Matrix Representations 28  Adjacency matrix (A):  n n binary matrix  A=[aij], aij=1 if edge between node i and j 5 1 2 3 4 6 1 2 3 4 5 6 1 0 1 1 0 1 0 2 1 0 1 0 0 0 3 1 1 0 1 0 0 4 0 0 1 0 1 1 5 1 0 0 1 0 1 6 0 0 0 1 1 0 Matrix Representations 29  Degree matrix (D):  n n diagonal matrix  D=[dii], dii = degree of node i 5 1 2 3 4 6 1 2 3 4 5 6 1 3 0 0 0 0 0 2 0 2 0 0 0 0 3 0 0 3 0 0 0 4 0 0 0 3 0 0 5 0 0 0 0 3 0 6 0 0 0 0 0 2 Matrix Representations 30  How can we use (L) to find good partitions of our graph?  What are the eigenvalues and eigenvectors of (L)? We know: L . x = λ . x  Spectrum of Laplacian Matrix (L) 31  The   Laplacian Matrix (L) has: Eigenvalues Eigenvectors  Important where properties:  Eigenvalues are non-negative real numbers  Eigenvectors are real and orthogonal  What is trivial eigenpair? 𝒙 = (𝟏, … , 𝟏) then 𝑳 ⋅ 𝒙 = 𝟎 and so 𝝀 = 𝝀𝟏 = 𝟎 Best Eigenvector for partitioning 32  Second Eigenvector  Best eigenvector that represents best quality of graph partitioning.  Let’s check the components of through (2 )  Fact: For symmetric matrix (L): 2  min x L x T x  Minimum is taken under the constraints is unit vector: that is 𝒊 𝒙𝟐𝒊 = 𝟏  𝒙 is orthogonal to 1st eigenvector (𝟏, … , 𝟏) thus: 𝒊 𝒙𝒊 ⋅ 𝟏 = 𝒊 𝒙𝒊 = 𝟎 𝒙 Details! λ2 as optimization problem  Fact: For symmetric matrix (L): 2  min x L x T x  What is the meaning of min xT L x on G? x = xT D x − xT A x 𝑛 2 T  x D x = 𝑖=1 𝑑𝑖 𝑥𝑖 = 𝑖,𝑗 x TL x T x TL Ax= x = 𝑖,𝑗 ∈𝐸 2𝑥𝑖 𝑥𝑗 2 2 (𝑥 + 𝑥 𝑗 𝑖,𝑗 ∈𝐸 𝑖 Remember : L = D - A 2 2 (𝑥 + 𝑥 𝑗) ∈𝐸 𝑖 − 2𝑥𝑖 𝑥𝑗 ) = 𝒊,𝒋 ∈𝑬 𝒙𝒊 − 33 λ2 as optimization problem 34 2  min  ( i , j )E ( xi  x j ) 2 x All labelings of nodes 𝑖 so that 𝑥𝑖 = 0 We want to assign values 𝒙𝒊 to nodes i such that few edges cross 0. (we want xi and xj to subtract each other) x 𝑥𝑖 0 𝑥𝑗 Balance to minimize Spectral Partitioning Algorithm: Example 35  2 3 4 5 6 1 3 -1 -1 0 -1 0 2 -1 2 -1 0 0 0 3 -1 -1 3 -1 0 0 4 0 0 -1 3 -1 -1 5 -1 0 0 -1 3 -1 6 0 0 0 -1 -1 2 0.0 0.4 0.3 -0.5 -0.2 -0.4 -0.5 1.0 0.4 0.6 0.4 -0.4 0.4 0.0 0.4 0.3 0.1 0.6 -0.4 0.5 0.4 -0.3 0.1 0.6 0.4 -0.5 4.0 0.4 -0.3 -0.5 -0.2 0.4 0.5 5.0 0.4 -0.6 0.4 -0.4 -0.4 0.0 1) Pre-processing:   1 Build Laplacian matrix L of the graph 2) Decomposition:   Find eigenvalues  and eigenvectors x of the matrix L Map vertices to corresponding components of X2 = 3.0 3.0 1 0.3 2 0.6 3 0.3 4 -0.3 5 -0.3 6 -0.6 X= How do we now find the clusters? Spectral Partitioning Algorithm: Example 36  3) Grouping:  Sort components of reduced 1-dimensional vector  Identify clusters by splitting the sorted vector in two  How to choose a splitting point?  Naïve  Split approaches: at 0 or median value Split at 0: Cluster A: Positive points Cluster B: Negative points 1 0.3 2 0.6 3 0.3 4 -0.3 1 0.3 4 -0.3 5 -0.3 2 0.6 5 -0.3 6 -0.6 3 0.3 6 -0.6 A B Example: Spectral Partitioning Value of x2 37 Rank in x2 Example: Spectral Partitioning 38 Value of x2 Components of x2 Rank in x2 k-Way Spectral Clustering 39  How do we partition a graph into k clusters?  Two basic approaches:  Recursive bi-partitioning [Hagen et al., ’92]  Recursively apply bi-partitioning algorithm in a hierarchical divisive manner  Disadvantages: Inefficient  Cluster multiple eigenvectors [Shi-Malik, ’00]  Build a reduced space from multiple eigenvectors  Commonly used in recent papers  A preferable approach   4 Spectral Clustering Deep GraphEncoder Deep GraphEncoder [Tian et al., 2014] Muhammad Burhan Hafez Autoencoder 41  Architecture: D1 D2 E1 E2  Reconstruction loss: Autoencoder & Spectral Clustering 42  Simple theorem (Eckart-Young-Mirsky theorem):  Let A be any matrix, with singular value decomposition (SVD) A = U Σ VT  Let be the decomposition where we keep only the k largest singular values  Then, is Note: If A is symmetric  singular values are eigenvalues & U = V = eigenvectors. Result (1): Spectral Clustering ⇔ matrix reconstruction Autoencoder & Spectral Clustering (cont’d) 43  Autoencoder case:   based on previous theorem, where X = U Σ VT and K is the hidden layer size Result (2): Autoencoder ⇔ matrix reconstruction Deep GraphEncoder | Algorithm 44  Clustering with GraphEncoder: 1. 2. Learn a nonlinear embedding of the original graph by deep autoencoder (the eigenvectors corresponding to the K smallest eigenvalues of graph Lablacian matrix). Run k-means algorithm on the embedding to obtain clustering result. Deep GraphEncoder | Efficiency 45  Approx. guarantee: Cut found by Spectral Clustering and Deep GraphEncoder is at most 2 times away from the optimal.  Computational Complexity: Spectral Clustering Θ (n3) due to EVD GraphEncoder Θ (ncd) c : avg degree of the graph d: max # of hidden layer nodes Deep GraphEncoder | Flexibility 46  Sparsity   constraint can be easily added. Improving the efficiency (storage & data processing). Improving clustering accuracy. Original objective function Sparsity constraint  5 BigCLAM: Introduction Overlapping Community Detection SHANG XINDI Non-overlapping Communities 48 Nodes Nodes Network Adjacency matrix Non-overlapping vs Overlapping 49 Facebook Network 50 Social communities High school Summer internship Stanford (Basketball) Stanford (Squash) Nodes: Facebook Users Edges: Friendships 50 Overlapping Communities 51 Edge density in the overlaps is higher! Network Adjacency matrix Assumption 52 Community membership strength matrix 𝑭 (>=0) Nodes  Communities  𝑷𝑪 𝒖, 𝒗 : Probability of u and v have connection according to community C  𝑷𝑪 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑪 ⋅ 𝑭𝒗𝑪 ) 𝑭=  𝑷 𝒖, 𝒗 : At least one common community 𝑪 links the nodes: j 𝑭𝒗𝑨 … strength of 𝒗’s membership to 𝑨  𝑷 𝒖, 𝒗 = 𝟏 − 𝑪 𝟏 − 𝑷𝑪 𝒖, 𝒗 𝑻)  = 𝟏 − 𝐞𝐱𝐩(−𝑭 ⋅ 𝑭 𝒖 𝒗 𝑭𝒖 … vector of community membership strengths of 𝒖 Detecting Communities with MLE 53    𝑷 𝒖, 𝒗|𝑭 : Probability of u and v have connection 𝑮 𝑽, 𝑬 : Given a social graph Maximize likelihood to find best F 0 0.9 0.9 0 0 1 1 0 0.9 0 0.9 0 1 0 1 0 0.9 0.9 0 0.9 1 1 0 1 0 0.1 0.9 0 0 0 1 0 𝑷 𝒖, 𝒗|𝑭 𝑮 𝑽, 𝑬 Detecting Communities with MLE 54  Maximum Likelihood Estimation Data 𝑿  Assumption: Data is generated by some model 𝒇(𝚯)  Given: 𝒇 … model  𝚯 … model parameters  Estimate --- 𝑷 𝒖, 𝒗 --- 𝑭 likelihood 𝑷𝒇 𝑿 𝚯): probability that the model 𝒇 (with parameters 𝜣) generated the data  The BigCLAM 55  Given a network 𝑮(𝑽, 𝑬), estimate 𝑷𝑷 𝑮 𝑭): 𝑷(𝒖, 𝒗) (𝒖,𝒗)∈𝑬  where:  (𝟏 − 𝑷 𝒖, 𝒗 ) 𝒖,𝒗 ∉𝑬 𝑷(𝒖, 𝒗) = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝑻𝒗 ) Maximize 𝑷𝑷 𝑮 𝑭): 𝒂𝒓𝒈𝒎𝒂𝒙𝑭 𝑷𝑷 𝑮 𝑭): Yang, Jaewon, and Jure Leskovec. "Overlapping community detection at scale: a nonnegative matrix factorization approach." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013. BigCLAM 56   Many times we take the logarithm of the likelihood, and call it log-likelihood: 𝒍 𝑭 = 𝐥𝐨𝐠 𝑷(𝑮|𝑭) Goal: Find 𝑭 that maximizes 𝒍(𝑭):   5 BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods Overlapping Community Detection He Ruidan BigCLAM: How to find F 58  Model Parameter: Community membership strength matrix F   Each row vector Fu in F is the community membership strength of node u in the graph BigCLAM v1.0: How to find F 59  Block coordinate gradient ascent: update Fu for each u with other Fv fixed  Compute the gradient of single row BigCLAM v1.0: How to find F 60  Coordinate gradient ascent:  Iterate over the rows of F BigCLAM v1.0: How to find F 61 Constant Time O(n)  This is slow! Takes linear time O(n) to compute  As we are solving this for each node u, there are n nodes in total, the overall time complexity is thus O(n^2).  Cannot be applied to large graphs with millions of nodes. BigCLAM v2.0: How to find F 62    However, we notice that: Usually, the average degree of node in a graph could be treat as constant, Then it takes constant time to compute Therefore, time complexity to update matrix F is reduced to O(n)   6 BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods Overlapping Community Detection He Ruidan BigCLAM: How to find F 64  Model Parameter: Community membership strength matrix F   Each row vector Fu in F is the community membership strength of node u in the graph BigCLAM v1.0: How to find F 65  Block Coordinate gradient ascent:  Iterate over the rows of F x + ax’ x BigCLAM v1.0: How to find F 66 Constant Time O(n)  This is slow! Takes linear time O(n) to compute  As we are solving this for each node u, there are n nodes in total, the overall time complexity is thus O(n^2).  Cannot be applied to large graphs with millions of nodes. BigCLAM v2.0: How to find F 67    However, we notice that: Usually, the average degree of node in a graph could be treat as constant, Then it takes constant time to compute Therefore, time complexity to update matrix F is reduced to O(n)   5 BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods Overlapping Community Detection He Ruidan Graph Representation 69  Representation learning of graph node.  Try to represent each node using as a numerical vector. Given a graph, the vectors should be learned automatically.  Learning objective: The representation vectors for nodes share similar connections are close to each other in the vector space  After the representation of each node is learnt. Community detection could be modeled as a clustering / classification problem. Graph Representation 70  Graph representation using neural networks / deep learning  B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710. ACM, 2014.  J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW. ACM, 2015.  F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu. Learning deep representations for graph clustering. In AAAI, 2014. Summary 71     Introduction & Motivation Graph cut criterion  Min-cut  Normalized-cut Non-overlapping community detection  Spectral clustering  Deep auto-encoder Overlapping community detection  BigCLAM algorithm 72 Appendix Details! Facts about the Laplacian L (a) All eigenvalues are ≥ 0 (b) 𝑥 𝑇 𝐿𝑥 = 𝑖𝑗 𝐿𝑖𝑗 𝑥𝑖 𝑥𝑗 ≥ 0 for every 𝑥 (c) 𝐿 = 𝑁 𝑇 ⋅ 𝑁  That  is, 𝐿 is positive semi-definite Proof:  (c)(b):  As it 𝑥 𝑇 𝐿𝑥 = 𝑥 𝑇 𝑁 𝑇 𝑁𝑥 = 𝑥𝑁 𝑇 𝑁𝑥 ≥ 0 is just the square of length of 𝑁𝑥 Let 𝝀 be an eigenvalue of 𝑳. Then by (b) 𝑥 𝐿𝑥 ≥ 0 so 𝑥 𝑇 𝐿𝑥 = 𝑥 𝑇 𝜆𝑥 = 𝜆𝑥 𝑇 𝑥  𝝀 ≥ 𝟎  (a)(c): is also easy! Do it yourself.  (b)(a): 𝑇 73 2  min xT M x Proof:    Details! x Write 𝑥 in axes of eigenvecotrs 𝑤1 , 𝑤2 , … , 𝑤𝑛 of 𝑴. So, 𝑥 = 𝑛𝑖 𝛼𝑖 𝑤𝑖 Then we get: 𝑀𝑥 = 𝑖 𝛼𝑖 𝑀𝑤𝑖 = 𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖 = 𝟎 if 𝒊 ≠ 𝒋 𝑻 𝝀 𝒘 𝒊 𝒊 So, what is 𝒙 𝑴𝒙? 1 otherwise 𝑥 𝑇 𝑀𝑥 = 𝑖 𝛼𝑖 𝑤𝑖 𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖 𝟐 𝝀 𝜶 𝒊 𝒊 𝒊 = 𝑖𝑗 𝛼𝑖 𝜆𝑗 𝛼𝑗 𝑤𝑖 𝑤𝑗 = 𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖 𝑤𝑖 =  To minimize this over all unit vectors x orthogonal to: w = min over choices of (𝛼1 , … 𝛼𝑛 ) so that: 𝛼𝑖2 = 1 (unit length) 𝛼𝑖 = 0 (orthogonal to 𝑤1 )  To minimize this, set 𝜶𝟐 = 𝟏 and so 𝟐 𝝀 𝜶 𝒊 𝒊 𝒊 = 𝝀𝟐 74

Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL

Related documents

Products

Support

Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib