PPT - Eurecom

advertisement
Part II: Complex Networks
Empirical Properties and Metrics
Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr
Eurecom, Sophia-Antipolis
Textbooks
“Networks, Crowds, and Markets: Reasoning
About a Highly Connected World” by D. Easley and
T. Kleinberg (“NCM”: publicly available online)
·
“Networks: An Introduction” by M. Newman –
(“Networks”: shared copies in library)
Networked Life: 20 Questions and Answers by
M.Chiang (some chapters - shared copies in library)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
2
What is a Network?
 A set of “nodes”

Humans, routers, web pages, telephone switches, airports,
proteins, scientific articles …
 Relations between these nodes






humans: friendship/relation or online friendship
routers, switches: connected by a communication link
web pages: hyperlinks from one to other
airports: direct flights between them
articles: one citing the other
proteins: link if chemically interacting
 Network often represented as
a graph:


vertex = node
link  relation (weight  strength)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
3
Social Networks (of the past)
The social network of friendships within a 34-person
karate club provides clues to the fault lines that
eventually split the club apart (Zachary, 1977)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
4
Social Networks (of the past)
High school dating
Peter S. Bearman, James Moody
and Katherine Stovel
Chains of affection: The structure of
adolescent romantic and sexual networks
American Journal of Sociology 110
44-91 (2004)
Image drawn by Mark Newman
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
5
Network Research of the Past
 Mostly done by Social Scientists
 Interested in Human (Social) Networks
 Spread of Diseases, Influence, etc.
 Methodology: Questionnaires  cumbersome, (lots of) bias
 Network Size: 10s or at most 100s
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
6
Email Network
Email flows amongst a large project
team. Colors denote each
participant’s department
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
7
(Online) Social Networks
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
8
(a subset!) of the Internet Graph
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
9
The Science of Complex Networks
 The study of large networks coming from all sorts of diverse areas

We will focus on technological (e.g. Internet) and information networks (e.g. Web,
Facebook)

Cannot visually observe such networks (as in the case of old social networks of few
10s of nodes)  need ways to measure them, and quantify their properties

The field is often called Social Networks or Network Science or Network Theory
 Question 1: What are the statistical properties of real networks?

Connectivity, paths lengths, degree distributions

How do we measure such huge networks  sampling
 Question 2: Why do these properties arise?
Models of large networks: random graphs
 Deterministic ways too complex/restrictive

 Question 3: How can we take advantage of these properties?
Connectivity (epidemiology, resilience)
 Spread (information, disease)
 Search (Web page, person)

Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
10
Part I: Network Properties of Interest
 There are a lot of different properties we might be
interested in  also depends on application
 But there are some commonly studied properties for 2
reasons:
1.
2.
These properties are important for key applications
The majority of networks exhibit surprising similarities with
respect to these properties.
1. Degree distribution (“scale free structure”)
2. Path length (“small world phenomena”)
3. Clustering (“community structure”)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
11
Measuring Real Networks: Degree distributions
 Problem: find the probability distribution that best fits
the observed data
frequency
fk = fraction of nodes with degree k
= probability of a randomly
selected node to have degree k
fk
k
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
degree
Eurecom, Sophia-Antipolis
Basic Graph Properties: Revision Material
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
13
Undirected Graphs
 Graph G=(V,E)
V
= set of vertices
 E = set of edges
2
1
3
5
undirected graph
E={(1,2),(1,3),(2,3),(3,4),(4,5)}
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
4
Directed Graphs
 Graph G=(V,E)
V
= set of vertices
 E = set of edges
2
1
3
5
directed graph
E={‹1,2›, ‹2,1› ‹1,3›, ‹3,2›, ‹3,4›, ‹4,5›}
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
4
Weighted and Unweighted Graphs
 Edges have / do not have a weight associated with
them
8
13
4
5
weighted
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
unweighted
Eurecom, Sophia-Antipolis
Undirected graph: Degree Distribution
 degree d(i) of node i
2
number of edges incident
on node i
 degree distribution
1 node with degree 1
1
3 nodes with degree 2
1 node with degree 3
P(1) = 1/5, P(2) = 3/5, P(3) = 1/5
3
5
3
2
1
1
2
3
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
degree
Eurecom, Sophia-Antipolis
4
Undirected Graph: Degree Distribution
P(k)
0.6
0.5
0.4
0.3
0.2
0.1
1
2
3
4
k
Network Science: Graph Theory January 24,
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
2011
Directed Graph: In- and Out-Degree
2
 in-degree din(i) of node i
 number of edges pointing to node i
 out-degree dout(i) of node i
 number of edges leaving node i
1
 in-degree sequence
5
 [1,2,1,1,1]
 out-degree sequence
 [2,1,2,1,0]
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
3
Eurecom, Sophia-Antipolis
4
Paths
 Path from node i to node j: a sequence of edges (directed
or undirected from node i to node j)



path length: number of edges on the path
nodes i and j are connected
cycle: a path that starts and ends at the same node
2
2
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1
3
5
4
Eurecom, Sophia-Antipolis
4
Shortest Paths
 Shortest Path from node i to node j

also known as BFS path, or geodesic path
2
2
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1
3
5
4
Eurecom, Sophia-Antipolis
4
Diameter
 The longest shortest path in the graph
2
2
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1
3
5
4
Eurecom, Sophia-Antipolis
4
Undirected graph: Components
 Connected graph: a graph
where every pair of nodes is
connected
 Disconnected graph: a graph
that is not connected
 Connected Components:
subsets of vertices that are
connected
2
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
4
Fully Connected Graph
 Clique Kn
 A graph that has all possible n(n-1)/2 edges
2
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
4
Eurecom, Sophia-Antipolis
Directed Graph
2
 Strongly connected graph:
there exists a path from every
i to every j
 Weakly connected graph: If
edges are made to be
undirected the graph is
connected
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1
3
5
Eurecom, Sophia-Antipolis
4
Adjacency Matrix: Undirected Graph
 Adjacency Matrix

symmetric matrix for undirected graphs
2
0
1

A  1

0
0
1 1 0 0
0 1 0 0 
1 0 1 0

0 1 0 1
0 0 1 0 
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1
Eurecom, Sophia-Antipolis
3
5
4
Adjacency Matrix: Directed Graph
 Adjacency Matrix

non-symmetric matrix for undirected graphs
2
0
1

A  0

0
0
1 1 0 0
0 0 0 0 
1 0 1 0

0 0 0 1
0 0 0 0 
1
3
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
4
Examples of Adjacency Matrices
1
2
1
0
1
1
0
1
2
5
6
3
3
0
1

1

1
4
0
0
1
1
0
1
1
1 
1

0
G1
1
2
0 1 0


1
0
1


0 0 0
G2
symmetric
undirected: n2/2
directed: n2
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
G3
Eurecom, Sophia-Antipolis
0
1

1

0
0

0
0

0
7
1 1
0 0
0 0
1 1
0 0
0 0
0 0
0 0
0 0 0 0 0
1 0 0 0 0
1 0 0 0 0

0 0 0 0 0
0 0 1 0 0

0 1 0 1 0
0 0 1 0 1

0 0 0 1 0
Exponential distribution
 Probability of having k neighbors
p(k) = λe-λk
 Identified by a line in the log-linear plot
log p(k) = - λk + log λ
log frequency
λ
degree
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Power-law distributions
 Right-skewed/Heavy-tail distribution


p(k) = Ck-α
there is a non-negligible fraction of nodes that has very high
degree (hubs)
scale-free: f(ax) = bf(x), no characteristic scale, average is not
informative
 Power-law distribution gives a line in the log-log
plot log p(k) = -α logk + logC
log frequency
frequency
α
log degree
degree
 α : power-law exponent (typically 2 ≤ α ≤ 3)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Power Law vs. Exponential Distribution
This difference is particularly obvious if we plot them on a log vertical scale: for
large x there are orders of magnitude differences between the two functions.
0
10
0
10
1
10
2
10
3
10
f ( x )  cx 0.5
f ( x )  cx 0.5
10-1
f ( x )  cx 1
10-2
f ( x )  c x
10-3
f ( x )  c x
f ( x )  cx 1
loglog
semilog
10-4
Network Science: Scale-Free Property February 7,
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
2011
Internet Topology Primer
Internet backbone and
regional connectivity
Multi-tier AS topology
Gateway Routers
inside ASs
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
32
Internet Degree Distribution
 Holds for both AS and Router topologies
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
33
Degree Distribution for Other Networks
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
34
Power Law Exponent in Real Networks (M. Newman 2003)
α : power-law exponent (typically 2 ≤ α ≤ 3)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
35
Measuring path length
 dij = shortest path between i and j
dij
 Diameter: d  max
i, j
 Average path length:
1

dij

n(n - 1)/2 i j
 Also of interest: distribution of all shortest paths
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Path Length: Lattice Network
 A total of n nodes arranged in a
grid
 Only neighbors (up,down,left,right)
connected
Q: What is the diameter of the
network?
A: 2 n -2
Q: What is the avg. distance?
n

n
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
i.e. picking two nodes randomly
A: It is in the order of n (i.e. c n )
Eurecom, Sophia-Antipolis
37
Path Length: Random Geometric Network
 n wireless nodes in an area of 1x1
 Each transmits at distance R
 R must be at
connectivity
 logn 
 for
Ο
least 
n 

Q: Choose two random nodes: What
is the expected hop count
(distance) between them?

n 

A: Ο

 logn 
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
38
Millgram’s small world experiment
 Letters were handed out to people in Nebraska to be sent
to a target in Boston
 People were instructed to pass on the letters to someone
they knew on first-name basis

~60 letters, only about 35% delivered
 The letters that reached the destination followed paths
of length around 6
 Six degrees of separation: (play of John Guare)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Millgram’s small world experiment: Email Version
 In 2001, Duncan Watts, a professor at Columbia
University, recreated Milgram's experiment using
an e-mail message as the “package" that needed to
be delivered.
 Surprisingly, after reviewing the data collected
by 48,000 senders and 19 targets in 157
different countries, Watts found that again the
average number of intermediaries was 6.
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Kevin Bacon number: link  2 actors in same movie
Let’s make
it legal
Austin Powers:
The spy who
shagged me
Robert Wagner
Wild Things
What Price Glory
Barry Norton
A Few
Good Men
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Monsieur
Verdoux
Eurecom, Sophia-Antipolis
Kevin Bacon Number
(statistics from IMDB)
 ~740000 linkable actors
 Average (path length) = 3
 99% of actors less than 6 hops
 Try your own actor here: http://www.cs.virginia.edu/oracle/
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
42
Erdos number: collaboration networks
 Legendary mathematician Paul Erdos,

around 1500 papers and 509 collaborators
 Collaboration Graph: link between two
authors who wrote a paper together
 Erdos number of X: hop count between
Erdos and author X in collaboration graph
 ~260,000 in connected component
T. Spyropoulos
Kostas Psounis
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Kostas Psounis
Eurecom, Sophia-Antipolis
43
Internet Path Lengths
Number of AS traversed by an
email message
• ~35000 nodes
• Avg. path ~ 5!
Number of routers traversed
by an email message
• >200000
• Avg. path ~ 15
plots taken from R. V. Hofstad
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
44
Internet Path Length: Different Continents
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
45
Measurement Findings: Path Length
 Milgram’s experiment => Small World Phenomenon
 Short paths exist between most nodes: Path length l <<
total nodes N (e.g line network: path length l = O(N))
“Small world” = avg. path length l is at most logN
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
46
Clustering (Transitivity) coefficient
 Measures the density of triangles (local clusters) in the
graph
 Two different ways to measure it:
1
4
3
2
 The ratio of the means
C (1)
 triangles centered at node i

 triples centered at node i
i
i
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
5
Example
1
4
3
2
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
5
Eurecom, Sophia-Antipolis
C
(1)
3
3


1 1  6 8
Clustering (Transitivity) coefficient
 Clustering coefficient for node i
triangles centered at node i
Ci 
triples centered at node i
 The mean of the ratios
C
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
(2)
1
 Ci
n
Eurecom, Sophia-Antipolis
Clustering Coeff. In Real Nets (M. Newman 2003)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Summary of Findings
Most real networks have…
1. Short paths between nodes (“small world”)
2. Transitivity/Clustering coefficient that is finite > 0
3. Degree distribution that follows a power law
Q1. Can we design graph models that exhibit similar
characteristics?
Q2. Can we explain how/why these phenomena occur in
the first place?
Q3. Can we take advantage of these properties (e.g.
searching, advertising, viral infection/immunization, etc.)?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
51
Download