Seminar on Social Networks, Big Data, Influence, and Decision-Making University of Toronto Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University 1 Friendship networks • network of on- and off-line friends form a large web of interconnected links Dimension matching in OSNs 2 6 degrees of separation • (Stanley Milgram, 67): famous chain letter experiment Dimension matching in OSNs 3 6 Degrees in Facebook? • 1.31 billion users • (Backstrom et al., 2012) – 4 degrees of separation in Facebook – when considering another person in the world, a friend of your friend knows a friend of their friend, on average • similar results for Twitter and other OSNs Dimension matching in OSNs 4 Complex networks in the era of Big Data • web graph, social networks, biological networks, internet networks, … Dimension matching in OSNs 5 What is a complex network? • no precise definition • however, there is general consensus on the following observed properties 1. large scale 2. evolving over time 3. power law degree distributions 4. small world properties Dimension matching in OSNs 6 Examples of complex networks • technological/informational: web graph, router graph, AS graph, call graph, e-mail graph • social: on-line social networks (Facebook, Twitter, LinkedIn,…), collaboration graphs, coactor graph • biological networks: protein interaction networks, gene regulatory networks, food networks Dimension matching in OSNs 7 Properties of complex networks 1. Large scale: relative to order and size • web graph: order > trillion – some sense infinite: number of strings entered into Google • Facebook: > 1 billion nodes; Twitter: > 270 million nodes – much denser (ie higher average degree) than the web graph • protein interaction networks: order in thousands Dimension matching in OSNs 8 Properties of complex networks 2. Evolving: networks change over time • web graph: billions of nodes and links appear and disappear each day • Facebook: grew to 1 billion users – denser than the web graph • protein interaction networks: order in the thousands – evolves much more slowly Dimension matching in OSNs 9 Properties of Complex Networks 3. Power law degree distribution • for a graph G of order n and i a positive integer, let Ni,n denote the number of nodes of degree i in G • we say that G follows a power law degree distribution if for some range of i and some b > 2, b Ni ,n i n • b is called the exponent of the power law Dimension matching in OSNs 10 Power laws in OSNs Dimension matching in OSNs 11 Graph parameters • average distance: n L(G ) d (u, v) u ,vV ( G ) 2 1 • clustering coefficient: -1 deg( x) 1 , C (G) n c( x) c( x) | E ( N ( x)) | xV (G ) 2 Dimension matching in OSNs 12 Properties of Complex Networks 4. Small world property • introduced by Watts & Strogatz in 1998: – low distances • diam(G) = O(log n) • L(G) = O(loglog n) – higher clustering coefficient than random graph with same expected degree Dimension matching in OSNs 13 Sample data: Flickr, YouTube, LiveJournal, Orkut • (Mislove et al,07): short average distances and high clustering coefficients Dimension matching in OSNs 14 Other properties of complex networks • many complex networks (including on-line social networks) obey two additional laws: • Densification power law (Leskovec, Kleinberg, Faloutsos,05): – networks are becoming more dense over time; i.e. average degree is increasing |(E(Gt)| ≈ |V(Gt)|a where 1 < a ≤ 2: densification exponent Dimension matching in OSNs 15 • Decreasing distances (Leskovec, Kleinberg, Faloutsos,05): – distances (diameter and/or average distances) decrease with time (Kumar et al,06): Dimension matching in OSNs 16 Other properties • Connected component structure: emergence of components; giant components • Spectral properties: adjacency matrix and Laplacian matrices, spectral gap, eigenvalue distribution • Small community phenomenon: most nodes belong to small communities (ie subgraphs with more internal than external links) … Dimension matching in OSNs 17 Blau space • OSNs live in social space or Blau space: – each user identified with a point in a multi-dimensional space – coordinates correspond to sociodemographic variables/attributes • homophily principle: the flow of information between users is a declining function of distance in Blau space Dimension matching in OSNs 18 Underlying geometry Feature space thesis: every complex network is naturally associated with an underlying feature space. For eg: – web graph: topic space – OSNs: Blau space – PPIs: biochemical space Dimension matching in OSNs 19 Dimensionality • Question: What is the dimension of the Blau space of OSNs? • what is a credible mathematical formula for the dimension of an OSN? Dimension matching in OSNs 20 Six dimensions of separation 21 Why model complex networks? • uncover and explain the generative mechanisms underlying complex networks • predict the future • nice mathematical challenges • models can uncover the hidden reality of networks Dimension matching in OSNs 22 Networks - Bonato 23 “All models are wrong, but some are more useful.” – G.P.E. Box Dimension matching in OSNs 24 Random geometric graphs • n nodes are randomly placed in the unit square • each node has a constant sphere of influence, radius r • nodes are joined if their Euclidean distance is at most r • G(n,r), r = r(n) Dimension matching in OSNs 25 Some properties of G(n,r) Theorem (Penrose,97) Let μ = nexp(-πr2n). 1. If μ = o(1), then asymptotically almost surely (a.a.s.) G is connected. 2. If μ = Θ(1), then a.a.s. G has a component of order Θ(n). 3. If μ →∞, then a.a.s. G is disconnected. • many other properties studied of G(n,r): chromatic number, clique number, Hamiltonicity, random walks, … Dimension matching in OSNs 26 Spatially Preferred Attachment (SPA) model (Aiello, Bonato, Cooper, Janssen, Prałat,08), (Cooper, Frieze, Prałat,12) • volume of sphere of influence proportional to indegree • nodes are added and spheres of influence shrink over time • a.a.s. leads to power laws graphs, low directed diameter, and small separators Dimension matching in OSNs 27 Ranking models (Fortunato, Flammini, Menczer,06), (Łuczak, Prałat, 06), (Janssen, Prałat,09) • parameter: α in (0,1) • each node is ranked 1,2, …, n by some function r – 1 is best, n is worst • at each time-step, one new node is born, one randomly node chosen dies (and ranking is updated) • link probability r-α • many ranking schemes a.a.s. lead to power law graphs: random initial ranking, degree, age, etc. Dimension matching in OSNs 28 Geometric model for OSNs • we consider a geometric model of OSNs, where – nodes are in mdimensional Euclidean space – volume of spheres of influence variable: a function of ranking of nodes Dimension matching in OSNs 29 Geometric Protean (GEO-P) Model (Bonato, Janssen, Prałat, 12) • parameters: α, β in (0,1), α+β < 1; positive integer m • nodes live in an m-dimensional hypercube • each node is ranked 1,2, …, n by some function r – 1 is best, n is worst – we use random initial ranking • at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) • each existing node u has a region of influence with volume r n • add edge uv if v is in the region of influence of u Dimension matching in OSNs 30 Notes on GEO-P model • models uses both geometry and ranking • number of nodes is static: fixed at n – order of OSNs at most number of people (roughly…) • top ranked nodes have larger regions of influence Dimension matching in OSNs 31 Simulation with 5000 nodes Dimension matching in OSNs 32 Simulation with 5000 nodes random geometric Dimension matching in OSNs GEO-P 33 Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) • a.a.s. the GEO-P model generates graphs with the following properties: – power law degree distribution with exponent b = 1+1/α – average degree d = (1+o(1))n(1-α-β)/21-α • densification – diameter D = nΘ(1/m) • small world: constant order if m = Clog n – bad spectral expansion and high clustering coefficient Dimension matching in OSNs 34 Dimension of OSNs • given the order of the network n and diameter D, we can calculate m • gives formula for dimension of OSN: log n m log D Dimension matching in OSNs 35 Logarithmic Dimension Hypothesis In an OSN of order n and diameter D, the dimension of its Blau space is log n log D • posed independently by (Leskovec,Kim,11), (Frieze, Tsourakakis,11) Dimension matching in OSNs 36 Uncovering the hidden reality • reverse engineering approach – given network data (n, D), dimension of an OSN gives smallest number of attributes needed to identify users • that is, given the graph structure, we can (theoretically) recover the social space Dimension matching in OSNs 37 6 Dimensions of Separation OSN Dimension Facebook 7 YouTube 6 Twitter 4 Flickr 4 Cyworld 7 Dimension matching in OSNs 38 MITACS team, UBC 2012 L to R: Amanda Tian, David Gleich, Myughwan Kim, Me, Stephen Young, Dieter Mitsche Dimension matching in OSNs 39 Dimension matching in OSNs 40 MGEO-P (Bonato, Gleich, Mitsche, Prałat, Tian, Young,14) • time-steps in GEO-P form a computational bottleneck • consider a GEO-P where we forget the history of ranks – memoryless GEO-P (MGEO-P) • place n points u.a.r. in the hypercube • assign ranks from via a random permutation σ • for each pair i > j, ij is an edge if j is in the ball of volume σ(i)–αn-β Dimension matching in OSNs 41 Contrasting the models • by considering the evolution of ranks in GEO-P, the probability that an edge is present in GEO-P and not in MGEO-P is: 1 4 22 2 O n log(n) o(1) • intuitively, the models generate similar graphs • many a.a.s properties hold in MGEO-P with similar parameters Dimension matching in OSNs 42 Properties of the MGEO-P model (BGMPTY,14) • a.a.s. the MGEO-P model generates graphs with the following properties: – power law degree distribution with exponent b = 1+1/α – average degree d = (1+o(1))n(1-α-β)/21-α • densification – diameter D = nΘ(1/m) Dimension matching in OSNs 43 Proof sketch: diameter • eminent node: – highly ranked: ranking greater than some fixed R • partition hypercube into small hypercubes • choose size of hypercubes and R so that – each hypercube contains at least log2n eminent nodes – sphere of influence of each eminent node covers each hypercube and all neighbouring hypercubes • choose eminent node in each hypercube: backbone • show all nodes in hypercube distance at most 2 from backbone Dimension matching in OSNs 44 Back to question… • How would we measure the dimensionality of Blau space? Dimension matching in OSNs 45 Aside: machine learning • machine learning is a branch of AI where computers make decisions and answer questions based on data sets • examples: – spam filters – Netflix recommender systems • especially useful when the data or number of decisions are too large for humans to process Dimension matching in OSNs 46 Facebook100 Dimension matching in OSNs 47 Validating the LDH • we tested the dimensionality of large-scale samples from real OSN data – Facebook100 and LinkedIn (sampled over time) • IDEA: use machine learning (SVM) to predict dimensions – features: small subgraph counts (3- and 4vertex subgraphs) – compared sampled data vs simulations of MGEO-P with dimensions 1 through 12 Dimension matching in OSNs 48 Graphlets Dimension matching in OSNs 49 Experimental design Dimension matching in OSNs 50 Sample: Michigan Dimension matching in OSNs 51 Stanford3: n: edges: avgdeg: plexp: 11621 568330 97.81086 3.730000 GeoP alphabeta: alpha: beta: parameters 0.510389 0.366300 0.144089 python geop_dim_experiment.py --logcount -s 50 -t 0 --mmax 12 --prob 0.001 Stanford3 11621 568330 0.366300 0.144089 M-GeoP LADTree: J48: Logistic: SVM: dimensions: 2 3 5 5 Dimension matching in OSNs 52 FB and LinkedIn - SVM Dimension matching in OSNs 53 FB and LinkedIn - Eigenvalues Dimension matching in OSNs 54 Figure 6. For three of the Facebook networks, we show the eigenvalue histogram in red, the eigenvalue histogram from the best fit MGEO-P network in blue, and the eigenvalue histograms for samples from the other dimensions in grey. Bonato A, Gleich DF, Kim M, Mitsche D, et al. (2014) Dimensionality of Social Networks Using Motifs and Eigenvalues. PLoS ONE 9(9): e106052. doi:10.1371/journal.pone.0106052 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0106052 Future directions • Other data sets • Fractal dimension • What are the attributes? • What implications does LDH have for OSNs or social networks in general? Dimension matching in OSNs 56