Dimension matching in Facebook and LinkedIn networks

advertisement
Seminar on Social Networks, Big Data,
Influence, and Decision-Making
University of Toronto
Dimension matching in
Facebook and LinkedIn networks
Anthony Bonato
Ryerson University
1
Friendship networks
• network of on- and off-line friends form a large
web of interconnected links
Dimension matching in OSNs
2
6 degrees of separation
• (Stanley Milgram, 67):
famous chain letter
experiment
Dimension matching in OSNs
3
6 Degrees in Facebook?
• 1.31 billion users
• (Backstrom et al., 2012)
– 4 degrees of separation in
Facebook
– when considering another
person in the world, a friend of
your friend knows a friend of
their friend, on average
• similar results for Twitter
and other OSNs
Dimension matching in OSNs
4
Complex networks in the era of Big Data
• web graph, social networks, biological networks, internet
networks, …
Dimension matching in OSNs
5
What is a complex network?
• no precise definition
• however, there is general consensus on the
following observed properties
1. large scale
2. evolving over time
3. power law degree distributions
4. small world properties
Dimension matching in OSNs
6
Examples of complex networks
• technological/informational: web graph, router
graph, AS graph, call graph, e-mail graph
• social: on-line social networks (Facebook,
Twitter, LinkedIn,…), collaboration graphs, coactor graph
• biological networks: protein interaction
networks, gene regulatory networks, food
networks
Dimension matching in OSNs
7
Properties of complex networks
1. Large scale: relative to order and size
•
web graph: order > trillion
– some sense infinite: number of strings entered into
Google
• Facebook: > 1 billion nodes; Twitter: > 270 million
nodes
– much denser (ie higher average degree) than the
web graph
• protein interaction networks: order in thousands
Dimension matching in OSNs
8
Properties of complex networks
2. Evolving: networks change over time
•
web graph: billions of nodes and links appear and
disappear each day
• Facebook: grew to 1 billion users
– denser than the web graph
• protein interaction networks:
order in the thousands
–
evolves much more slowly
Dimension matching in OSNs
9
Properties of Complex Networks
3. Power law degree distribution
•
for a graph G of order n and i a positive integer, let Ni,n
denote the number of nodes of degree i in G
•
we say that G follows a power law degree distribution if
for some range of i and some b > 2,
b
Ni ,n  i n
•
b is called the exponent of the power law
Dimension matching in OSNs
10
Power laws in OSNs
Dimension matching in OSNs
11
Graph parameters
• average distance:
n
L(G )   d (u, v) 
u ,vV ( G )
 2
1
• clustering coefficient:
-1
 deg( x) 

1 
 , C (G)  n   c( x) 
c( x)  | E ( N ( x)) | 
 xV (G ) 
 2 
Dimension matching in OSNs
12
Properties of Complex Networks
4. Small world property
• introduced by Watts & Strogatz in 1998:
– low distances
• diam(G) = O(log n)
• L(G) = O(loglog n)
– higher clustering coefficient than random graph with
same expected degree
Dimension matching in OSNs
13
Sample data: Flickr, YouTube,
LiveJournal, Orkut
• (Mislove et al,07): short average distances
and high clustering coefficients
Dimension matching in OSNs
14
Other properties of complex networks
•
many complex networks (including on-line
social networks) obey two additional laws:
• Densification power law (Leskovec,
Kleinberg, Faloutsos,05):
– networks are becoming more dense over
time; i.e. average degree is increasing
|(E(Gt)| ≈ |V(Gt)|a
where 1 < a ≤ 2: densification exponent
Dimension matching in OSNs
15
•
Decreasing distances (Leskovec, Kleinberg,
Faloutsos,05):
–
distances (diameter and/or average distances) decrease
with time
(Kumar et al,06):
Dimension matching in OSNs
16
Other properties
• Connected component structure: emergence of
components; giant components
• Spectral properties: adjacency matrix and Laplacian
matrices, spectral gap, eigenvalue distribution
• Small community phenomenon: most nodes belong to
small communities (ie subgraphs with more internal than
external links)
…
Dimension matching in OSNs
17
Blau space
• OSNs live in social space or
Blau space:
– each user identified with a point in a
multi-dimensional space
– coordinates correspond to sociodemographic variables/attributes
• homophily principle: the flow of
information between users is a
declining function of distance in
Blau space
Dimension matching in OSNs
18
Underlying geometry
Feature space thesis: every complex
network is naturally associated with an
underlying feature space.
For eg:
– web graph: topic space
– OSNs: Blau space
– PPIs: biochemical space
Dimension matching in OSNs
19
Dimensionality
• Question: What is the dimension of the
Blau space of OSNs?
• what is a credible mathematical formula
for the dimension of an OSN?
Dimension matching in OSNs
20
Six dimensions of separation
21
Why model complex networks?
• uncover and explain the generative
mechanisms underlying complex networks
• predict the future
• nice mathematical challenges
• models can uncover the hidden reality of
networks
Dimension matching in OSNs
22
Networks - Bonato
23
“All models are wrong, but some are more useful.”
– G.P.E. Box
Dimension matching in OSNs
24
Random geometric graphs
• n nodes are randomly
placed in the unit square
• each node has a
constant sphere of
influence, radius r
• nodes are joined if their
Euclidean distance is at
most r
• G(n,r), r = r(n)
Dimension matching in OSNs
25
Some properties of G(n,r)
Theorem (Penrose,97) Let μ = nexp(-πr2n).
1. If μ = o(1), then asymptotically almost surely (a.a.s.) G
is connected.
2. If μ = Θ(1), then a.a.s. G has a component of order
Θ(n).
3. If μ →∞, then a.a.s. G is disconnected.
•
many other properties studied of G(n,r): chromatic
number, clique number, Hamiltonicity, random walks, …
Dimension matching in OSNs
26
Spatially Preferred Attachment (SPA) model
(Aiello, Bonato, Cooper, Janssen, Prałat,08),
(Cooper, Frieze, Prałat,12)
• volume of sphere of
influence proportional to indegree
• nodes are added and
spheres of influence shrink
over time
• a.a.s. leads to power laws
graphs, low directed
diameter, and small
separators
Dimension matching in OSNs
27
Ranking models
(Fortunato, Flammini, Menczer,06),
(Łuczak, Prałat, 06), (Janssen, Prałat,09)
• parameter: α in (0,1)
• each node is ranked 1,2, …, n by some function r
– 1 is best, n is worst
• at each time-step, one new node is born, one randomly
node chosen dies (and ranking is updated)
• link probability r-α
• many ranking schemes a.a.s. lead to power law graphs:
random initial ranking, degree, age, etc.
Dimension matching in OSNs
28
Geometric model for OSNs
• we consider a geometric
model of OSNs, where
– nodes are in mdimensional Euclidean
space
– volume of spheres of
influence variable: a
function of ranking of
nodes
Dimension matching in OSNs
29
Geometric Protean (GEO-P) Model
(Bonato, Janssen, Prałat, 12)
• parameters: α, β in (0,1), α+β < 1; positive integer m
• nodes live in an m-dimensional hypercube
• each node is ranked 1,2, …, n by some function r
– 1 is best, n is worst
– we use random initial ranking
• at each time-step, one new node v is born, one randomly
node chosen dies (and ranking is updated)
• each existing node u has a region of influence with volume

r n

• add edge uv if v is in the region of influence of u
Dimension matching in OSNs
30
Notes on GEO-P model
• models uses both geometry and ranking
• number of nodes is static: fixed at n
– order of OSNs at most number of people
(roughly…)
• top ranked nodes have larger regions of
influence
Dimension matching in OSNs
31
Simulation with 5000 nodes
Dimension matching in OSNs
32
Simulation with 5000 nodes
random geometric
Dimension matching in OSNs
GEO-P
33
Properties of the GEO-P model
(Bonato, Janssen, Prałat, 2012)
• a.a.s. the GEO-P model generates graphs with the
following properties:
– power law degree distribution with exponent
b = 1+1/α
– average degree d = (1+o(1))n(1-α-β)/21-α
• densification
– diameter D = nΘ(1/m)
• small world: constant order if m = Clog n
– bad spectral expansion and high clustering coefficient
Dimension matching in OSNs
34
Dimension of OSNs
• given the order of the network n and
diameter D, we can calculate m
• gives formula for dimension of OSN:
log n
m
log D
Dimension matching in OSNs
35
Logarithmic Dimension
Hypothesis
In an OSN of order n and diameter D, the
dimension of its Blau space is
log n
log D
• posed independently by (Leskovec,Kim,11),
(Frieze, Tsourakakis,11)
Dimension matching in OSNs
36
Uncovering the hidden reality
• reverse engineering approach
– given network data (n, D), dimension of an OSN gives
smallest number of attributes needed to identify users
• that is, given the graph structure, we can (theoretically)
recover the social space
Dimension matching in OSNs
37
6 Dimensions of Separation
OSN
Dimension
Facebook
7
YouTube
6
Twitter
4
Flickr
4
Cyworld
7
Dimension matching in OSNs
38
MITACS team, UBC 2012
L to R: Amanda Tian, David Gleich, Myughwan Kim, Me, Stephen Young, Dieter Mitsche
Dimension matching in OSNs
39
Dimension matching in OSNs
40
MGEO-P
(Bonato, Gleich, Mitsche, Prałat, Tian, Young,14)
• time-steps in GEO-P form a computational bottleneck
• consider a GEO-P where we forget the history of ranks
– memoryless GEO-P (MGEO-P)
• place n points u.a.r. in the hypercube
• assign ranks from via a random permutation σ
• for each pair i > j, ij is an edge if j is in the ball of
volume
σ(i)–αn-β
Dimension matching in OSNs
41
Contrasting the models
• by considering the evolution of ranks in GEO-P, the
probability that an edge is present in GEO-P and not in
MGEO-P is:
1 4
  22 

2
O n
log(n)   o(1)


• intuitively, the models generate similar graphs
• many a.a.s properties hold in MGEO-P with similar
parameters
Dimension matching in OSNs
42
Properties of the MGEO-P model
(BGMPTY,14)
• a.a.s. the MGEO-P model generates graphs with the
following properties:
– power law degree distribution with exponent
b = 1+1/α
– average degree d = (1+o(1))n(1-α-β)/21-α
• densification
– diameter D = nΘ(1/m)
Dimension matching in OSNs
43
Proof sketch: diameter
• eminent node:
– highly ranked: ranking greater than some
fixed R
• partition hypercube into small hypercubes
• choose size of hypercubes and R so that
– each hypercube contains at least log2n
eminent nodes
– sphere of influence of each eminent
node covers each hypercube and all
neighbouring hypercubes
• choose eminent node in each hypercube:
backbone
• show all nodes in hypercube distance at
most 2 from backbone
Dimension matching in OSNs
44
Back to question…
• How would we measure the dimensionality
of Blau space?
Dimension matching in OSNs
45
Aside: machine learning
• machine learning is a branch of AI
where computers make decisions and
answer questions based on data sets
• examples:
– spam filters
– Netflix recommender systems
• especially useful when the data or
number of decisions are too large for
humans to process
Dimension matching in OSNs
46
Facebook100
Dimension matching in OSNs
47
Validating the LDH
• we tested the dimensionality of large-scale
samples from real OSN data
– Facebook100 and LinkedIn (sampled over
time)
• IDEA: use machine learning (SVM) to predict
dimensions
– features: small subgraph counts (3- and 4vertex subgraphs)
– compared sampled data vs simulations of
MGEO-P with dimensions 1 through 12
Dimension matching in OSNs
48
Graphlets
Dimension matching in OSNs
49
Experimental design
Dimension matching in OSNs
50
Sample: Michigan
Dimension matching in OSNs
51
Stanford3:
n:
edges:
avgdeg:
plexp:
11621
568330
97.81086
3.730000
GeoP
alphabeta:
alpha:
beta:
parameters
0.510389
0.366300
0.144089
python geop_dim_experiment.py --logcount -s 50 -t 0
--mmax 12 --prob 0.001 Stanford3 11621 568330 0.366300
0.144089
M-GeoP
LADTree:
J48:
Logistic:
SVM:
dimensions:
2
3
5
5
Dimension matching in OSNs
52
FB and LinkedIn - SVM
Dimension matching in OSNs
53
FB and LinkedIn - Eigenvalues
Dimension matching in OSNs
54
Figure 6. For three of the Facebook networks, we show the eigenvalue histogram in red, the
eigenvalue histogram from the best fit MGEO-P network in blue, and the eigenvalue
histograms for samples from the other dimensions in grey.
Bonato A, Gleich DF, Kim M, Mitsche D, et al. (2014) Dimensionality of Social Networks Using Motifs and Eigenvalues. PLoS ONE
9(9): e106052. doi:10.1371/journal.pone.0106052
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0106052
Future directions
• Other data sets
• Fractal dimension
• What are the attributes?
• What implications does LDH have for
OSNs or social networks in general?
Dimension matching in OSNs
56
Download