Elektrotehnički fakultet Univerziteta u Beogradu BSP Clustering Algorithm for Social Network Analysis Branislav Petrović 3273/2012 Introduction • Social Networks - highly dynamic, evolving relationships among people or other entities. • Social Network Analysis (SNA) – new research field in data mining. • Research on SNA includes: clustering analysis, classification, link prediction. 2/15 Introduction • Traditional clustering algorithms group objects based on their similarity. • Social network clustering analysis divides objects into classes based on their links as well as their attributes. 3/15 Social network in graph theory • Social Network - directed graph composed by objects and their relationship. 4/15 Business System Planning (BSP) • BSP clustering algorithm uses objects and links among objects to make clustering analysis. • Steps of BSP algorithm: – – – – – Generate edge creation matrix and edge pointed matrix Calculate one-step reachable matrix between objects Calculate multi-steps reachable matrix between objects Calculate reachable matrix Identify relationships among classes 5/15 Generate Lc and Lp • Lc – m x n edge creation matrix. • Lp – m x n edge pointed matrix. • Lc (i, j) =1 - object Oi connects with the tail of edge Ej • Lp (i, j) =1 - object Oi connects with the head of edge Ej 6/15 Calculate one-step reachable matrix G Lc * Lp • • • • n T T g i , j V ( l c ( i , k )^ l p ( k , j )) k 1 i = 1..m, j = 1..n. ^ – Boolean product. V – Boolean sum. G(i, j) =1 – Oi to Oj is a one-step reachable relation. 7/15 Calculate multi-step reachable matrix G G *G g 2 m 2 i, j V ( g ( i , k )^ g ( k , j )) k 1 • i = 1..m, j = 1..n. G G *G 3 2 G G *G 4 3 ... G m 1 G m2 *G 8/15 Calculate reachable matrix R=I*VG*VG2 *...*VGm−1 • I – unit matrix. • V – Boolean sum. • R(i, j) = 1 – reachable relation exists from Oi to Oj. 9/15 Calculate mutual reachable matrix Q=R^RT • ^ – Boolean product. • Q(i, j) = 1 – there are mutual reachable relation between Oi and Oj . • Strong sub-matrix – all elements in a submatrix of Q are 1. 10/15 Identify relationships among classes • If there is one-step reachable relation between two objects in different classes, directed links exist between those classes. 11/15 Social network clustering analysis algorithm Input: Lc : Edge creation Matrix Lp : Edge pointed matrix Begin G Lc * Lp T for k=3 to m do Gk −1 =Gk −2 *G R = I V G V G2 ... V Gm−1 T Q R^ R Qk− > C (Ck ,Q)->Relation (Ck ) End • Qk− > C – generating clusters through mutual reachable matrix Q. • (Ck ,Q ) – > Relation(Ck) – identifying relationships among clusters base on clusters and onestep reachable matrix G. 12/15 Improvement over BSP Clustering Algorithm • Disadvantage of BSP CA – uses matrices to store edges and reachable relations. • Propose modification – using Link list data structure. Struct snode { Int row, col, val; Struct snode *next; }; Row Col Val *next 13/15 Shortcomings • Edges between objects have same weight. • Property of each cluster has not been analyzed. 14/15 Thank you for listetning Questions? 15/15